Sarthak Garg's homepage

Hey There!

You have stumbled on my homepage! I am a Sotware Engineer at Apple Inc. working on Machine Translation (Fun Fact: Siri can translate!). I am interested in Machine Translation and Machine Learning in general. Before this I have had the pleasure of spending time at amazing Computer Science departments at Indian Institute of Technology Kanpur and Carnegie Mellon University.

In my spare time, I like to play tennis, watch netflix, drink coffee, play board games or go for a hike.

Here are some of the things I have worked on:

Jointly Learning to Align and Translate with Transformer Models

Sarthak Garg, Stephan Peitz, Udhyakumar Nallasamy, Matthias Paulik, Conference on Empirical Methods in Natural Language Processing (EMNLP). November 2019 (To Appear).

The state of the art in machine translation (MT) is governed by neural approaches, which typically provide superior translation accuracy over statistical approaches. However, on the closely related task of word alignment, traditional statistical word alignment models often remain the go-to solution. In this paper, we present an approach to train a Transformer model to produce both accurate translations and alignments. We extract discrete alignments from the attention probabilities learnt during regular neural machine translation model training and leverage them in a multi-task framework to optimize towards translation and alignment objectives. We demonstrate that our approach produces competitive results compared to GIZA++ trained IBM alignment models without sacrificing translation accuracy and outperforms previous attempts on Transformer model based word alignment. Finally, by incorporating IBM model alignments into our multi-task training, we report significantly better alignment accuracies compared to GIZA++ on three publicly available data sets. Paper

Learning to Relate from Captions and Bounding Boxes

Sarthak Garg, Joel Ruben Antony Moniz, Anshu Aviral, Priyatham Bollimpalli, The 57th Annual Meeting of the Association for Computational Linguistics (ACL). July 2019.

In this work, we propose a novel approach that predicts the relationships between various entities in an image in a weakly supervised manner by relying on image captions and object bounding box annotations as the sole source of supervision. Our proposed approach uses a top-down attention mechanism to align entities in captions to objects in the image, and then leverage the syntactic structure of the captions to align the relations. We use these alignments to train a relation classification network, thereby obtaining both grounded captions and dense relationships. We demonstrate the effectiveness of our model on the Visual Genome dataset by achieving a recall@50 of 15% and recall@100 of 25% on the relationships present in the image. We also show that the model successfully predicts relations that are not present in the corresponding captions. Paper

Bilingual Lexicon Induction with Semi-supervision in Non-isometric Embedding spaces.

Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, Graham Neubig, The 57th Annual Meeting of the Association for Computational Linguistics (ACL). July 2019.

Recent work on bilingual lexicon induction (BLI) has frequently depended either on aligned bilingual lexicons or on distribution matching, often with an assumption about the isometry of the two spaces. We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant. We then propose Bilingual Lexicon Induction with Semi-Supervision (BLISS) --- a semi-supervised approach that relaxes the isometric assumption while leveraging both limited aligned bilingual lexicons and a larger set of unaligned word embeddings, as well as a novel hubness filtering technique. Our proposed method obtains state of the art results on 15 of 18 language pairs on the MUSE dataset, and does particularly well when the embedding spaces don't appear to be isometric. In addition, we also show that adding supervision stabilizes the learning procedure, and is effective even with minimal supervision. Paper,Code

Compression and Localization in Reinforcement Learning for ATARI Games

Joel Ruben Antony Moniz, Barun Patra, Sarthak Garg, NeurIPS 2018 Deep Reinforcement Learning Workshop

Deep neural networks have become commonplace in the domain of reinforcement learning, but are often expensive in terms of the number of parameters needed. While compressing deep neural networks has of late assumed great importance to overcome this drawback, little work has been done to address this problem in the context of reinforcement learning agents. This work aims at making first steps towards model compression in an RL agent. In particular, we compress networks to drastically reduce the number of parameters in them (to sizes less than 3% of their original size), further facilitated by applying a global max pool after the final convolution layer, and propose using Actor-Mimic in the context of compression. Finally, we show that this global max-pool allows for weakly supervised object localization, improving the ability to identify the agent's points of focus.Paper

Circuit Bounds for Bipartite Matching in planar grid graphs

Me and Aayush Ojha are working on improving the circuit upper bound for Bipartite Matching in planar grid graphs from ACC₀ to AC₀. Recently Hansen, Komarath et. al. reduced bipartite matching in grid graphs to the monoid word problem. This problem reduces to proving the nonexistence of a periodic path in the shifted superposition of any perfect matching in a grid graph with itself. We have been able to prove the hypothesis for these subcases

Nonexistence of a periodic path which is peice-wise monotone in the y direction.
Nonexistence of a periodic path which enters the topmost and bottommost layer of the grid graph at least once.

We have used parity based arguments to prove these results. We aim to generalize the proofs without any restriction on the path

Extending Saturation Algorithms for Ordered Tree Pushdown Systems

Pushdown systems accurately model the control flow of first order recursive programs (like those written in C and Java), and lend themselves readily to algorithmic analysis. I researched extensively the class of saturation algorithms which compute whether some error states in a pushdown system are reachable or not. Recently Clemente, Parys, Salvati and Walukiewicz introduced Ordered Tree Pushdown Systems (OTPS), which subsume several classes of pushdown systems. Clemente et. al. developed an algorithm for deciding the reachability problem. We aim to extend it to compute any μ modal calculus denotation along the lines of M. Hague and C.-H.L. Ong's works.

pdf report

Object Detection in Traffic Surveillance Video

We implemented an object detector and classifier for detecting people, motorcycles, three wheelers and four wheelers in the institute's traffic surveillance video feed.

We extracted regions of interest from the video using image processing algorithms and refined then using NMS(non maximal supression) algorithm on a pyramid of gaussians built on the subframes of the initial regions.
We experimented with different feature representations of images like Histogram of Oriented Gradients(HOG) and Scale Invariant Feature Transform(SIFT) for training the classifiers. We achieved a classification accuracy of 88.8% using Linear SVC anf HOG feature representation.
We also tested a preexisting object detection framework Faster R-CNN based on convolutional networks (Developed by Shaoqiong Ren) on the project dataset.

On worst case to average case reductions of NP problems

I along with Drishti Wali and Sai Kishan Pampana studied the computational complexity class Distributional NP and explored whether or not the existence of problems in NP which have no polynomial time heuristic algorithms can be related to the BPP contains NP question.

We studied the paper titled 'On worst case to average case reductions for NP problems' by Luca Trevisan and Andrej Bogdanov and gave a class seminar presenting their ideas and approach.

Seminar pdf

Augmented Reality Fluidic Interface - Auraplay

I along with Prakhar Jawre, Avishek Nath and Shashank Bhushan developed an interactive computer interface in the form of a motorized mechanical arm mounted with a Raspberry Pi, a camera and a projector.

We applied image processing algorithms (Blob Detection and Contour Detection) using OpenCV to track a laser pointer which was used as a medium to interact with the projection. Postional feedback was sent to an Arduino unit which controlled the movement of a mechanical arm.

Code Cloud, code sharing and managing app

We built a code management website mainly focused on competitive programming using Django for backend and bootstrap3 for frontend. The website allows for private and public code sharing, annotating and searching algorithmic problems.

Github

Scala to x86 Compiler

As part of Compilers course project, we built a Scala to x86 assembly compiler with support for basic datatypes, conditional statements, looping statements, arrays, typechecking, basic type inference, nested functions and recursion.

Github