Hi !!! I am a PhD Candidate working with Prof. Klaus Mueller in the Computer Science Department at Stony Brook University. I love crunching numbers & developing interactive systems to tackle high impact social problems. Recently, I was awarded with IACS Junior researcher Award for my upcoming research on tackling Algorithmic Bias through human centered AI. Last year, I was awarded with Bloomberg Immersion fellowship. As a part of the fellowship, I worked with a non-profit, Matriculate, and built a real time dashboard to enhance their decision making. As Data Science for Social Good fellow at Georgia Tech, I got the opportunity to work with Prof. Ellen Zegura. I and my team tried to counter housing injustice by analysing Atlanta's anti-displacement policy & communicating its possible impacts. I also worked as a research intern at Indian Institute of Technology, Delhi where I tried to predict solar radiation to ensure power supply-demand equilibrium. Previously, I completed my Bachelor’s & Masters in Information Technology from Indian Institute of Information Technology, Gwalior. For my master's thesis, I worked on ensemble learning and recommender systems under the guidance of Prof. Anupam Shukla. I am also the recipient of Chairman Fellowship Award from Stony Brook University (2016).
Developed a flexible, robust and unbiased Recommendation System. Trained multiple weak learners to solve the same problem. Eliminated uncorrelated errors & biases of individual learners via double stacking. Analyzed the transition from single level to multi-level ensemble learning and its effects on the overall accuracy & variance. I used movielens dataset from Grouplens Project and used a host of techniques like Collaborative Filtering, PAM, Content based recommender, Random Forest, SVM, ANN, etc. Each of the base learner is solving a sub-problem. I found that Diversity and Accuracy of base learners together determine the effectiveness of Ensemble learning models.
This study aims to bolster transparency & awareness about the state of power infrastructure in data poor countries. This project tries to identify & rank different countries in Africa & Asia suffering from serious power infrastructure shortage. Apart from World Bank data, this study specifically focuses on Satellite imagery data collected by NOAA from 1992-2013. Intensity of light in night time satellite images for a specific country is considered as a proxy for power infrastructure in that country. It is combined with demographic, economic and development indicators from world bank data to cluster countries into different categories depending on power infrastructure.
Ads don’t have any perceptible characteristic features which identify them uniquely. Ad-Blockers classify online images as ads based on hard-coded filters. This project tried to automate this process using deep learning techniques. In this project, I crawled 10k+ images from 150+ websites & used easylist filters to classify online images as ads. Lastly, used AlexNet & Inception network to classify images which yielded ~94% accuracy.
This project tries to understand American Graduate Admissions process by specifically analyzing MS Computer Science application over past 5 years. Edulix has been used as data source. After extensive data cleaning, I have tried to model admissions data based on patterns extracted from data and domain knowledge. The key to analyzing Graduate Admissions data is to analyze data in buckets rather than considering all in one bucket. The project aims to help students choose the right Universities by predicting whether a student will be admitted to a specific University.
Ad-Blockers try to block all possible Ads and websites try to sneak their ads through or force users to disable ad blockers. We need to find a middle ground where websites can sustain themselves and end-users are exposed to less intrusive ads that provide a more positive user experience. In this project, We have analyzed website content & gave them a rating based on their ad content. The rating is assigned based on parameters like number of ads, download time, download size & screen space occupied by ads on the respective websites. The websites with rating above a certain threshold have been classified as white-listed sites and won’t be affected by Ad-blockers. This novel technique will encouragead-servers & content providers towards Acceptable ads which will be a win-win situation for everyone.
This project presents two novel techniques to improve existing semantic role representations to enable better understanding of the language. Firstly, We have tried to retrofit word vectors generated from LSTM model with scientific processes corpus to generate better word embeddings. Second technique uses a semi-supervised model which learns word embeddings using role as context. On testing, We found that first model outperforms existing role labeling models for scientific processes. The second model also performs well even for small annotated datasets. We have concluded by suggesting few ideas for further optimizing this model.
This project aims to ensure power supply-demand equilibrium by predicting the variable components like Solar Radiation with appreciable accuracy. I used R to analyze and model weather data of Delhi from 1979 - 2014. I concluded the project with a prediction accuracy of more than 80% while predicting solar radiation upto 6 months in future.
Visual Analytics & Imaging Lab
Stony Brook University
Stony Brook, NY
Nokia Bell Labs
Murray Hill, NJ
Data Science Consultant
New York, NY
Data Science for Social Good Fellow
Georgia Institute of Technology
Stony Brook University
Stony Brook, NY
InfoEdge (India) Ltd.
Indian Institute of Technology, Delhi