skip to page content SBU
Stony Brook University
Data Science Fundamentals
CSE51d - Fall 2015

Home
Syllabus
Assignments
Notes

Syllabus

A list of topics by lecture is provided below. Reading(s) for some lectures will be listed here as the
course progresses. You can find a quick reference to the syllabus in this [pdf]. Note that the syllabus
is tentative and will be adjusted, if needed, as the semester proceeds.

Lectures and Readings

Lecture 1 Introduction

  • Introduction to Data Science
  • Big Data products
  • Scoping Projects, Asking good questions
  • The Data Mining process
  • Course introduction

Lecture 2 Data Preparation

  • Basic Data Types
  • Data Analytics Major Building Blocks: A Bird's Eye View
  • Data Collection, Storage, Cleaning, Integration

Lecture 3 Exploratory Data Analysis

  • Summary statistics
  • Online Analytical Processing (OLAP)
  • Visual analysis

Lecture 4-5 Statistics

  • Statistics basics, Experiment design, Pitfalls
  • Observational and Longitudinal studies
  • Probability Distributions
  • Hypothesis Testing
  • Significance

Lecture 6 Visualization

  • Visualization fundamentals
  • Charts, Graphs, Infographics
  • Interactive Visualization
  • Summarization

Lecture 7 Optimization

  • Convexity, Convex functions
  • Convex Optimization: unconstrained/constrained
  • Gradient Descent and variants

Lecture 8-10 Statistical Learning

  • Statistical Models and Likelihood, Likelihood Principle, MLE foundations
  • Machine Learning Concepts & Tasks:
    • Classification
    • Regression
    • Clustering
    • Dimensionality Reduction
  • Machine Learning: Specific algorithms

Lecture 11-13 Data Mining

  • Data Mining Concepts & Tasks:
    • Association Rules
    • Similarity Search
    • Cluster Analysis
    • Outlier Analysis
  • Data Mining: Specific algorithms

Lecture 14-17 Data: Unstructured vs. Structured

  • Mining Text Data & Information Retrieval
  • Web Search & Recommender Systems
  • Mining Graphs/Network Data
  • Mining Spatial Data
  • Mining Time-Series Data

Lecture 18-19 Matrix Methods

  • Matrix Factorization: Models and Algorithms
    • SVD/PCA, ICA, CUR, CMD, NMF
  • Matrix-Vector Product & Applications

Lecture 20-21: Computing at Scale

  • Memory, Parallelization, Map-Reduce
  • Hadoop, Pig, HBase, Hive
  • Spark, Spark SQL

Lecture 22: Data Science in the Real-World

  • Data Journalism
  • Provenance, Privacy, Ethics, Governance


Stony Brook University, CSE 51d Data Science Fundamentals