CSE 544, Spring 2018: Probability & Statistics for Data Science

News:
03/07: Today's lecture is canceled, as per SBU weather advisory.
03/07: A3 is now due 3/21 instead of 3/19.
02/21: M1 will be in class on Monday, Feb 26.

CSE 544: Probability & Statistics for Data Science
Spring 2018


When: Mon Wed, 2:30pm - 3:50pm
Where: Humanities 1003
Instructor: Anshul Gandhi
Instructor Office Hours: Mon Wed, 4-5pm
             347, New CS building
Course TA and Graders: Eugenia Soroka, Leena Shekhar, Sai Rachana Patel Siddam

Course Info

This grad-level course covers probability and statistics topics required for data scientists to analyze and interpret data. The course is also part of the Data Science and Engineering Specialization. The course is targeted primarily at PhD and Masters students in the Computer Science Department. Topics covered include Probability Theory, Random Variables, Stochastic Processes, Statistical Inference, Hypothesis Testing, Regression, and Time Series Analysis. For more details, refer to the syllabus below.

The class is expected to be interactive and students are encouraged to participate in class discussions.

Grading will be on a curve, and will tentatively be based on assignments, exams, a semester-long group project, and class participation. For more details, see the section on grading below.

Syllabus & Schedule

Date Topic Readings Notes
Jan 22 (Mon)
[Lec 01]
Course introduction, class logistics
Jan 24 (Wed)
[Lec 02]
Probability review - 1
  • Basics: sample space, outcomes, probability
  • Events: mutually exclusive, independent
  • Calculating probability: sets, counting, tree diagram
  • AoS 1.1 - 1.6
    MHB 3.1 - 3.5
    Jan 29 (Mon)
    [Lec 03]
    Probability review - 2
  • Conditional probability
  • Law of total probability
  • Bayes' theorem
  • AoS 1.7
    MHB 3.6, 3.10 - 3.11
    Jan 31 (Wed)
    [Lec 04]
    Random variables - 1: Overview and Discrete RVs
  • Discrete and Continuous RVs
  • Mean, Moments, Variance
  • pmf, pdf, cdf
  • Discrete RVs: Bernoulli, Binomial, Geometric, Indicator
  • AoS 2.1 - 2.4
    MHB 3.7 - 3.9, 3.14.1
    assignment 1 out
    Feb 05 (Mon)
    [Lec 05]
    Random variables - 2: Continuous RVs
  • Uniform(a, b)
  • Exponential(λ)
  • Normal(μ, σ2), and its several properties
  • AoS 2.7
    MHB 3.14.1, 3.10, 3.13
    Python scripts:
    draw_Bernoulli, draw_Binomial, draw_Geometric,
    sample_Bernoulli, sample_Binomial, sample_Geometric
    Feb 07 (Wed)
    [Lec 06]
    Random variables - 3: Joint distributions & conditioning
  • Joint probability distribution
  • Linearity (and product) of expectation
  • Conditional expectation
  • Sum of a random number of RVs
  • AoS 2.8
    MHB 3.11 - 3.12, 3.15
    Python scripts:
    draw_Uniform, draw_Exponential, draw_Normal,
    sample_Uniform, sample_Exponential, sample_Normal
    Feb 12 (Mon)
    [Lec 07]
    Probability inequalities
  • Markov's Inequality
  • Chebyshev's inequality
  • Weak Law of Large Numbers
  • Central Limit Theorem
  • Chernoff Bounds
  • AoS 4.1 - 4.2, 23.1 - 23.3
    MHB 3.14.2, 8.1 - 8.7
    assignment 1 due
    assignment 2 out
    Feb 14 (Wed)
    [Lec 08]
    Markov chains
  • Stochastic processes
  • Setting up Markov chains
  • Balance equations
  • AoS 4.1 - 4.2, 23.1 - 23.3
    MHB 3.14.2, 8.1 - 8.7
    Feb 19 (Mon)
    [Lec 09]
    Non-parametric inference - 1
  • Basics of inference
  • Simple examples
  • Empirical PMF
  • Sample mean
  • bias, se, MSE
  • AoS 6.1 - 6.2, 6.3.1
    Feb 21 (Wed)
    [Lec 10]
    Non-parametric inference - 2
  • Empirical Distribution Function (or eCDF)
  • Kernel Density Estimation (KDE)
  • Statistical Functionals
  • Plug-in estimator
  • AoS 7.1 - 7.2 assignment 2 due
    assignment 3 out
    Required data q8.dat
    Feb 26 (Mon) Mid-term 1
    Feb 28 (Wed)
    [Lec 11]
    Confidence intervals
  • Percentiles, quantiles
  • Normal-based confidence intervals
  • DKW inequality
  • AoS 6.3.2, 7.1
    Mar 05 (Mon) Instructor Traveling No class
    Mar 07 (Wed) Snow day No class
    Mar 19 (Mon)
    [Lec 12]
    Parametric inference - 1
  • Consistency, Asymptotic Normality
  • Basics of parametric inference
  • Method of Moments Estimator (MME)
  • Properties of MME
  • Basics of MLE
  • AoS 6.3.1 - 6.3.2, 9.1 - 9.2
    Mar 21 (Wed) Snow day No class
    Mar 26 (Mon)
    [Lec 13]
    Parametric inference - 2 and Hypothesis testing - 1
  • Maximum Likelihood Estimator (MLE)
  • Properties of MLE
  • Basics of hypothesis testing
  • The Wald test
  • t-test
  • AoS 9.3 - 9.4, 9.6
    AoS 10 - 10.1, 10.10.2, 15.4
    DSD 5.3.1 - 5.3.2
    assignment 3 due
    assignment 4 out
    Required data q6_X.dat, q6_Y.dat, q9_sigma3.dat, q9_sigma100.dat
    Mar 28 (Wed)
    [Lec 14]
    Hypothesis testing - 2
  • Kolmogorov-Smirnov test (KS test)
  • p-values
  • Permutation test
  • AoS 15.4, 10.2, 10.5
    DSD 5.3.3, 5.5
    Apr 02 (Mon)
    [Lec 15]
    Bayesian inference
  • Bayesian reasoning
  • Bayesian inference
  • AoS 11.1 - 11.2
    DSD 5.6
    Apr 04 (Wed) Project proposal Meet in NCS 347.
    Schedule of group meetings.
    assignment 4 due on Apr 05 at 2:30pm (NCS 347).
    Apr 09 (Mon)
    [Lec 16]
    Regression - 1
  • Basics of Regression
  • Simple Linear Regression
  • AoS 13.1, 13.3 - 13.4 assignment 5 out
    Required data A5_q2.dat, A5_q6.dat
    Apr 11 (Wed) Mid-term 2
    Apr 16 (Mon)
    [Lec 17]
    Regression - 2
  • Multiple Linear Regression
  • AoS 13.5
    Apr 18 (Wed)
    [Lec 18]
    Time Series Analysis
  • EWMA Time Series modeling
  • AR Time Series modeling
  • Apr 23 (Mon)
    [Lec 19]
    Review
    Apr 25 (Wed) Project discussion assignment 5 due
    Apr 30 (Mon) Project final ppts
    May 02 (Wed) Project final ppts

    Resources

    Grading (tentative)

    Group Project

    Academic Integrity

    Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity. Please note that any incident of academic dishonesty will immediately result in an F grade for the student.

    Critical Incident Management

    Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn.

    Disability Support Services

    If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential. http://studentaffairs.stonybrook.edu/dss.
     Please report any errors to the Instructor.