Stony Brook University, Computer Science

This course will cover concepts and standard tools used to analyze, so called, Big Data. Specifically, we will cover algorithmic approaches to analyzing large datasets: MapReduce, large-scale text and graph analytics, distributed deep learning, and streaming algorithms, over modern distributed analysis platforms (e.g. Hadoop, Spark,). The course will have a large project component, incorporating analyses over large real world data sets.

This course will cover concepts and standard tools used to analyze, so called, Big Data. Specifically, we will cover algorithmic approaches to analyzing large datasets: MapReduce, large-scale text and graph analytics, distributed deep learning, and streaming algorithms, over modern distributed analysis platforms (e.g. Hadoop, Spark,). The course will have a large project component, incorporating analyses over large real world data sets.

- Mining of Massive Datasets v2.1, by Leskovec, Rajaraman, and Ullman
- Advanced Analytics with Spark by Ryza, Laserson, Owen, Wills
- Suggested: Hands-On Machine Learning with Scikit-Learn & TensorFlow by Geron