Santosh Kumar Ghosh

Email : sghosh@cs.stonybrook.edu
Phone: +1-631-552-1613
Graduation: December 2015

About Me

I am Masters student working under Prof. Samir Das at the Wirelss Network And Systems Laboratory. My area of focus is performance analysis of wireless streaming devices. I was an intern at Cablevision Systems where I worked on technologies like Server side events, Mqtt, SNMP in the context of developing a cloud based smart router network. I like working on problems of scale; and technologies that address these problems (NoSql, Node.JS, Memcached, different virtualization models) interests me.

Projects

Investigation of large scale brute force attacks on network services using the Elasticsearch-Logstash-Kibana (ELK) stack


Customised honeypots were setup in VMs scattered across the world by modifying implementations of network services like SSH, POP3, FTP, Wordpress and Joomla. Automated logs were collected using Logstash in a central repo (Elasticsearch) and the results were analysed in Kibana. Some of the parameters investigated were:
  • Does any pattern exist in the passwords used across services and geographic locations?
  • Similarity in type of usernames used across services and locations
  • Distribution of location of ip addresses used in different services (whether one particular country/city/neighbourhood was involved in more attacks)
Project Report and Some snapshots

Analysis of network behaviour of streaming devices (Roku, Chromecast, Amazon Fire) in response to various Over the top applications (Netflix, Hulu)

Ongoing (Advisor: Prof. Samir Das)

Prediction of review spams in Yelp.com using Machine Learning Algorithms


Analyzed review text features and reviewer behavior features to detect possible fake reviews. Some of the textual features used were : Number of capital words, exclamation marks/words, overall sentiment of the review,Number of entities mentioned in the text etc.
Reviewer behaviour features were also taken into account for example : Reviewer credibility (in terms of number of friends, Number of review etc.), Time of review, Similarity with other reviews.
A semi-supervised (Co-Training) algorithm was used for classification in addition to standard techniques (SVM, Naive Bayes)to see the performance difference. Used feature selection algorithms to get the best set of features for the classification.

Github Code Repo: Project report and code

Development of a preemptive operating system with support for demand paging and process management


Developed an oprating system in C on Qemu framework having features like virtual memory management, demand paging, process scheduling, COW fork, read only file system (TARFS). A shell was developed to interact with the OS. The system was coded from scratch including setting up interrupt tables, keyboard driver, video driver and the driver for programmable interrupt timer.

Github Code Repo: SBUnix

Implementation of adversarial search algorithms to solve Pacman game

Implemented Minimax and Alphabeta pruning algorithms to efficiently solve the Pacman game. Different heuristics were used to compare the performance at different levels of complexity of the game configurations.

Github Code Repo: Project report and code

Implementation of a solution for the TA assignment problem using Constraint Satisfaction Algorithms

This project deals with the problem of assigning teaching assistants to various courses subject, to time and eligibility constraints. We used various algorithms to incrementally improve the performance of our solution. At first simple backtracking technique was used. Next we used Backtracking with forward checking and lastly we used Backtracking with forward checking and constraint propagation.

Github Code Repo: Project report and code

Development of a spam filter using Naive Bayes

Developed a spam filter using Naive Bayes using multiple classification filters like term frequency classifier/binary classifier. Various smoothing techniques were used as well like Laplace smoothing to compare the results

Github Code Repo: Project report and code

Predicting duration of stay of a customer on a website using Decision Tree Learning

Prediction of duration of stay of a visitor on a website, given a set of page views data, using ID3 algorithm.

Github Code Repo: Project report and code

Develop statistical model of university bus transport and propose an optimal schedule

Used advanced statistical tenchniques as queuing, sampling, simulation of dicrete time systems, model fitting and goodness of fit tests to model the average wait time of students in the bus stops across the university. Proposed an optimal bus schedule that could reduce this wait time without increasing the opeational cost. Technologies used : Python

Courses Taken

  • Data structures and Algorithms
  • Operating Systems
  • Artificial Intelligence
  • Data Mining
  • Network Security
  • Wireless Networks.
  • Simultion and Modelling

Technical Skills

  • Languages : JAVA, Python, C, Javascript, jQuery
  • Tools: Numpy, Scilearn, Mqtt, Logstash, Elasticsearch, Kibana, MongoDB, Bash Scripting
  • Technologies: NoSQL, MessageQueue(PubSub paradigm), Bootstrap, JSON, Django, Bottle

Industry Exposure

I have been an intern at Cablevision Systems as a software engineer, where I worked on projects ranging from internal tool automation to developing architecture for a cloud based smart router network. Specifically I was involved in the following

  • Developing modules for a network monitoring tool that proactively detects and troubleshoots the voice network (landline phone).
  • Developing modules for a beta testing tool that senses the radio environment of the user premises and report the data back to a central repository at regular intervals. Used technologies like Server side events and Google charts
  • Worked closely with teams to develop an architecture for Cloud based router network. The aim was to find optimal number and placement of Mqtt Brokers and how to map routers to the brokers so that the load is well distributed. Different combinations of the periodicity of the messages, the serialization format and amount of message to be sent North bound was tested

Current interests

Internet of things greatly interest me. Orchestration of large scale systems like collection of data via numerous(millions maybe) small devices (read mobile, smart routers, home sensors etc.), transporting them to the cloud (decisions like what to use? JSON/compression/XML), and making sense out of the data in near real time excites me.
I am also greatly fascinated by the business models like IAAS, PAAS, SAAS and virtualization technologies that drive them.

Some of the things i am dabbling in right now are:

  • Mqtt (a lightweight messaging paradigm suited for mobile platforms. Geek-bit: Facebook uses this for its messenger)
  • msgpack (a binary alternative to JSON)
  • Nodered (A fast prototyping tool. Coz i am too lazy to configure everything!!)
  • Mongo DB
  • Pyhton Kivy (Build once run everywhere..Well almost!!)