Projects

Content Boosted Recommender System

The project explores ways to implement such recommendation systems by leveraging existing data about the users’ past choices (User Based Collaborative Filtering). Shortcomings of the approach were explored and meta-content was incorporated in the recommender system for robust predictions and dealing with the cold-start problem. Repository Link.

Dummy Compiler

Implementation of a toy compiler in C, based on a very basic language specification. Involved implementation of a tokeniser, parser (based on grammar) and type expression table generation. Repository Link.

Transformer Architecture From Scratch

PyTorch implementation of the Transformer model in “Attention is All You Need”. This model is based solely on attention mechanisms and introduces Multi-Head Attention. The encoder and decoder are made of multiple layers, with each layer consisting of Multi-Head Attention and Positionwise Feedforward sublayers. This model is currently used in many state-of-the-art sequence-to-sequence and transfer learning tasks.

The model reaches a bleu score of 35.44 which is comparable to the state of the art performances of recent models such as ‘Multi-Agent Dual Learning’ which reaches ~40 as reported on this leaderboard.

Neural Machine Translation by Jointly Learning to Align and Translate

Allievates the information compression problem by allowing the decoder to “look back” at the input sentence by creating context vectors that are weighted sums of the encoder hidden states. The weights for this weighted sum are calculated via an attention mechanism, where the decoder learns to pay attention to the most relevant words in the input sentence.
Based on the paper Neural Machine Translation by Jointly Learning to Align and Translate on English to French translation pairs.

Entropy Based Distance Metric, Python Package | Institute of Genomics and Integrative Biology, New Delhi

A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering, used on calculating distances between phenotypic profiles. Based on this paper.

Applied Econometrics | Mentor: Dr. N V M Rao

Student grade prediction using an ensemble of SVM, Random Forest, Lasso/Ridge Regression. Also trained a two layer neural network with similar results to the ensemble but more robust to cross validation. Beating the reported scores in this paper in 2/3 tasks. Dataset

Realtime Reproduction Number for COVID-19

As a pandemic evolves, increasing restrictions (or potential releasing of restrictions) change R_t. Knowing the current R_t is essential. When R_t > 1, the pandemic will spread through the entire population. If R_t < 1, the pandemic will grow to some fixed number less than the population. The lower R_t, the more manageable the situation. The value of R_t helps us (1) understand how effective our measures have been controlling an outbreak and (2) gives us vital information about whether we should increase or reduce restrictions based on our competing goals of economic prosperity and human safety. Well-respected epidemiologists argue that tracking Rt is the only way to manage through this crisis.

Based on the implementation of Bettencourt & Ribeiro 2008 by Kevin Systrom and applied to data retrieved on April 13, 2020.

Cervical Cancer Classification | CEERI, Pilani

Worked on making a mobile application for Cervical Cancer diagnosis and low-cost Colposcopic examinations at the Machine Vision Laboratory, CEERI-Pilani under Dr. J L Raheja.