Bhavya headshot

Bhavya Ghai

Phd Candidate

Department of Computer Science
Stony Brook University
bghai at cs dot stonybrook dot edu

Resume    LinkedIn    Twitter    Google scholar


Hi !!! I am a PhD Candidate working with Prof. Klaus Mueller in the Computer Science Department at Stony Brook University. I love crunching numbers & developing interactive systems to tackle high impact social problems. Recently, I was awarded with IACS Junior researcher Award for my upcoming research on tackling Algorithmic Bias through human centered AI. Last year, I was awarded with Bloomberg Immersion fellowship. As a part of the fellowship, I worked with a non-profit, Matriculate, and built a real time dashboard to enhance their decision making. As Data Science for Social Good fellow at Georgia Tech, I got the opportunity to work with Prof. Ellen Zegura. I and my team tried to counter housing injustice by analysing Atlanta's anti-displacement policy & communicating its possible impacts. I also worked as a research intern at Indian Institute of Technology, Delhi where I tried to predict solar radiation to ensure power supply-demand equilibrium. Previously, I completed my Bachelor’s & Masters in Information Technology from Indian Institute of Information Technology, Gwalior. For my master's thesis, I worked on ensemble learning and recommender systems under the guidance of Prof. Anupam Shukla. I am also the recipient of Chairman Fellowship Award from Stony Brook University (2016).

Research Interests:

  • Algorithmic Bias
  • Human Centered AI
  • Data for Social Good
  • Machine Learning
  • Data Visualization

Recent Updates


MDS paper

Visualization of Multivariate Data with Network Constraints using Multi-Objective Optimization.
Bhavya Ghai, Alok Mishra, Klaus Mueller
IEEE VIS Conference (extended abstract), Phoenix, AZ, Oct.2017
Video Preview | Poster | PDF

DSSG Atlanta paper

Using data science as a community advocacy tool to promote equity in urban renewal programs: An analysis of Atlanta's Anti-Displacement Tax Fund.
Jeremy Auerbach, Hayley Barton, Takeria Blunt, Vishwamitra Chaganti, Bhavya Ghai, Amanda Meng, Christopher Blackburn, Ellen Zegura
Bloomberg Data for Good exchange, New York, NY, Aug.2017
link | PDF | Poster
Path planning paper

Wave Front Method Based Path Planning Algorithm for Mobile Robots.
Bhavya Ghai, Anupam Shukla
ICTIS Springer, May 2016
link | PDF
Wireless sensor paper

Energy Efficient Dynamic Nearest Node Election for Localizations of Mobile Node in Wireless Sensor Networks.
Bhavya Ghai, Girish Pradeep Bindalkar, Sanjeev Sharma, Anupam Shukla
IEEE ICCIC, Dec 2015
link | PDF
  • Multi-level Ensemble learning based Recommender System

    Developed a flexible, robust and unbiased Recommendation System. Trained multiple weak learners to solve the same problem. Eliminated uncorrelated errors & biases of individual learners via double stacking. Analyzed the transition from single level to multi-level ensemble learning and its effects on the overall accuracy & variance. I used movielens dataset from Grouplens Project and used a host of techniques like Collaborative Filtering, PAM, Content based recommender, Random Forest, SVM, ANN, etc. Each of the base learner is solving a sub-problem. I found that Diversity and Accuracy of base learners together determine the effectiveness of Ensemble learning models.

  • Analyzing Power infrastructure shortage in African & Asian Countries using night time Satellite Imagery

    This study aims to bolster transparency & awareness about the state of power infrastructure in data poor countries. This project tries to identify & rank different countries in Africa & Asia suffering from serious power infrastructure shortage. Apart from World Bank data, this study specifically focuses on Satellite imagery data collected by NOAA from 1992-2013. Intensity of light in night time satellite images for a specific country is considered as a proxy for power infrastructure in that country. It is combined with demographic, economic and development indicators from world bank data to cluster countries into different categories depending on power infrastructure.

  • Towards Intelligent Ad-blockers: Exploring deep learning techniques to classify Ads

    Ads don’t have any perceptible characteristic features which identify them uniquely. Ad-Blockers classify online images as ads based on hard-coded filters. This project tried to automate this process using deep learning techniques. In this project, I crawled 10k+ images from 150+ websites & used easylist filters to classify online images as ads. Lastly, used AlexNet & Inception network to classify images which yielded ~94% accuracy.

  • Analysis of American Graduate Admissions Process

    This project tries to understand American Graduate Admissions process by specifically analyzing MS Computer Science application over past 5 years. Edulix has been used as data source. After extensive data cleaning, I have tried to model admissions data based on patterns extracted from data and domain knowledge. The key to analyzing Graduate Admissions data is to analyze data in buckets rather than considering all in one bucket. The project aims to help students choose the right Universities by predicting whether a student will be admitted to a specific University.

  • Online Advertising: Impact Analysis & Future directions

    Ad-Blockers try to block all possible Ads and websites try to sneak their ads through or force users to disable ad blockers. We need to find a middle ground where websites can sustain themselves and end-users are exposed to less intrusive ads that provide a more positive user experience. In this project, We have analyzed website content & gave them a rating based on their ad content. The rating is assigned based on parameters like number of ads, download time, download size & screen space occupied by ads on the respective websites. The websites with rating above a certain threshold have been classified as white-listed sites and won’t be affected by Ad-blockers. This novel technique will encouragead-servers & content providers towards Acceptable ads which will be a win-win situation for everyone.

  • Process Knowledge Extraction

    This project presents two novel techniques to improve existing semantic role representations to enable better understanding of the language. Firstly, We have tried to retrofit word vectors generated from LSTM model with scientific processes corpus to generate better word embeddings. Second technique uses a semi-supervised model which learns word embeddings using role as context. On testing, We found that first model outperforms existing role labeling models for scientific processes. The second model also performs well even for small annotated datasets. We have concluded by suggesting few ideas for further optimizing this model.

  • Prediction of Solar Radiation using Time Series Analysis

    This project aims to ensure power supply-demand equilibrium by predicting the variable components like Solar Radiation with appreciable accuracy. I used R to analyze and model weather data of Delhi from 1979 - 2014. I concluded the project with a prediction accuracy of more than 80% while predicting solar radiation upto 6 months in future.


  • Ph.D.(pursuing), Computer Science, Stony Brook University
  • M.Tech, Information Technology, Indian Institute of Information Technology, Gwalior
  • B.Tech, Information Technology, Indian Institute of Information Technology, Gwalior


  • Best Research Talk Award, GRD, 2019
  • IACS Junior Researcher Award, IACS, Stony Brook University, 2018
  • Bloomberg Immersion Fellowship, Bloomberg, 2017
  • Data Science for Social Good fellowship, Georgia Tech, Atlanta, 2017
  • Chairman's Fellowship Award, CS Dept., Stony Brook University, 2016
  • Graduate Fellowship, AICTE, 2015


  • Research Assistant
    Visual Analytics & Imaging Lab
    Stony Brook University

    Stony Brook, NY

  • Research Intern
    Nokia Bell Labs

    Murray Hill, NJ

  • Data Science Consultant

    New York, NY

  • Data Science for Social Good Fellow
    Georgia Institute of Technology

    Atlanta, GA

  • Teaching Assistant
    Stony Brook University

    Stony Brook, NY

  • Software Developer
    InfoEdge (India) Ltd.

    Noida, India

  • Research Intern
    Indian Institute of Technology, Delhi

    Delhi, India


  • Programming: Python, Java, C++, R, SQL
  • Data Analytics: Jupyter notebook, Tensorflow, Spark, MS-Excel, RStudio
  • Visualization: D3, Plotly, Leaflet, Matplotlib, Google charts


  • Oracle Certified Java Professional, Java SE6, 2014
  • Panel Speaker, Bloomberg Data for Good Exchange, 2017
  • Travel Grant Award, Data Science for Social Good Conference, Chicago, 2017
  • Full Silver Scholarship, ODSC Boston, 2018
  • Travel Grant Award, PyCon, Cleveland, 2018
  • Conference Scholarship, Fairness, Accountability & Transparency conference, New York, 2017
  • Served as Volunteer for ACM FAT* 2019, PyCon 2018, DSSG Chicago 2017