CSE 544, Spring 2021: Probability & Statistics for Data Science

News:
02/02: No class on 2/2 owing to snow day. Our first lecture will be on 2/4 via Zoom. Meeting link on Blackboard, under Zoom Meeting tab on the left.
01/12: Piazza course sign-up link

CSE 544: Probability & Statistics for Data Science
Spring 2021


When: Tu Th, 1:15pm - 2:35pm
Where: Online, via zoom (details below)

Instructor: Anshul Gandhi
Instructor Office Hours: Tu Fr, 11am - 12pm

Course TA and Graders: Supreeth Narasimhaswamy, Michael Yao, Srikar Pothumahanti
TA Office Hours: TBD

Course Info

This grad-level course covers probability and statistics topics required for data scientists to analyze and interpret data. The course is also part of the Data Science and Engineering Specialization. The course is targeted primarily at PhD and Masters students in the Computer Science Department. Topics covered include Probability Theory, Random Variables, Stochastic Processes, Statistical Inference, Hypothesis Testing, Regression, and Time Series Analysis. For more details, refer to the syllabus below.

The class is expected to be interactive and students are encouraged to participate in class discussions.

Grading will be on a curve, and will be based on assignments, exams, and a semester-end mini data analysis project. For more details, see the section on grading below.

Hybrid Instruction and Online Learning

The course will be primarily online, including lectures, as mentioned below. The hybrid component requirements below apply on to Section 544-02 (in-person students). Please email the instructor if you have any problems with remote instruction, such as a poor network connection, unaccommodating environment, or time zone issues.

Syllabus & Schedule

Date Topic Readings Notes
Feb 02 (Tu) Snow day No class
Feb 04 (Th)
[Lec 01]
Course introduction, class logistics
Feb 09 (Tu)
[Lec 02]
Probability review - 1
  • Basics: sample space, outcomes, probability
  • Events: mutually exclusive, independent
  • Calculating probability: sets, counting, tree diagram
  • AoS 1.1 - 1.5
    MHB 3.1 - 3.4
    Feb 11 (Th)
    [Lec 03]
    Probability review - 2
  • Conditional probability
  • Law of total probability
  • Bayes' theorem
  • AoS 1.6, 1.7
    MHB 3.3 - 3.6
    assignment 1 out
    Feb 16 (Tu)
    [Lec 04]
    Random variables - 1: Overview and Discrete RVs
  • Discrete and Continuous RVs
  • Mean, Moments, Variance
  • pmf, pdf, cdf
  • Discrete RVs: Bernoulli, Binomial, Geometric, Indicator
  • AoS 2.1 - 2.3, 3.1 - 3.4
    MHB 3.7 - 3.9
    Python scripts:
    draw_Bernoulli, draw_Binomial, draw_Geometric
    Feb 18 (Th)
    [Lec 05]
    Random variables - 2: Continuous RVs
  • Uniform(a, b)
  • Exponential(λ)
  • Normal(μ, σ2), and its several properties
  • AoS 2.4, 3.1 - 3.4
    MHB 3.7 - 3.9, 3.14.1
    Python scripts:
    draw_Uniform, draw_Exponential, draw_Normal
    Feb 23 (Tu)
    [Lec 06]
    Random variables - 3: Joint distributions & conditioning
  • Joint probability distribution
  • Linearity and product of expectation
  • Conditional expectation
  • AoS 2.5 - 2.8
    MHB 3.10 - 3.13, 3.15
    assignment 2 out
    assignment 1 due
    Feb 25 (Th)
    [Lec 07]
    Probability Inequalities
  • Weak Law of Large Numbers
  • Central Limit Theorem
  • AoS 5.3, 5.4
    MHB 3.14.2, 5.2
    Mar 02 (Tu)
    [Lec 08]
    Markov chains
  • Stochastic processes
  • Setting up Markov chains
  • Balance equations
  • AoS 23.1 - 23.3
    MHB 8.1 - 8.7
    Mar 04 (Th)
    [Lec 09]
    Non-parametric inference - 1
  • Basics of inference
  • Simple examples
  • Empirical PMF
  • Sample mean
  • bias, se, MSE
  • AoS 6.1 - 6.2, 6.3.1
    Mar 09 (Tu)
    [Lec 10]
    Non-parametric inference - 2
  • Empirical Distribution Function (or eCDF)
  • Kernel Density Estimation (KDE)
  • Statistical Functionals
  • Plug-in estimator
  • AoS 7.1 - 7.2 assignment 3 out. Required data: a3_q3.csv, collisions.csv, a3_q7.csv
    assignment 2 due

    Python scripts:
    sample_Bernoulli, sample_Binomial, sample_Geometric,
    sample_Uniform, sample_Exponential, sample_Normal, draw_eCDF
    Mar 11 (Th)
    [Lec 11]
    Confidence intervals
  • Percentiles, quantiles
  • Normal-based confidence intervals
  • DKW inequality
  • AoS 6.3.2, 7.1
    Mar 16 (Tu)
    [Lec 12]
    Parametric inference - 1
  • Consistency, Asymptotic Normality
  • Basics of parametric inference
  • Method of Moments Estimator (MME)
  • AoS 6.3.1 - 6.3.2, 9.1 - 9.2
    Mar 18 (Th)
    [Lec 13]
    Parametric inference - 2
  • Properties of MME
  • Basics of MLE
  • Maximum Likelihood Estimator (MLE)
  • Properties of MLE
  • AoS 9.3, 9.4, 9.6 assignment 3 due
    Mar 23 (Tu) Mid-term 1 Via Blackboard
    Mar 25 (Th)
    [Lec 14]
    Hypothesis testing - 1
  • Basics of hypothesis testing
  • Wald test
  • AoS 10 - 10.1
    DSD 5.3.1
    assignment 4 out
    Required data: acceleration, model, mpg, q8_a.csv, q8_b_X.csv, q8_b_Y.csv
    Mar 30 (Tu)
    [Lec 15]
    Hypothesis testing - 2
  • Type I and Type II errors
  • Wald test
  • AoS 10 - 10.1
    DSD 5.3.1
    Apr 01 (Th) Spring Brook Staycation No class
    Apr 06 (Tu)
    [Lec 16]
    Hypothesis testing - 3
  • Z-test
  • t-test
  • AoS 10.10.2
    DSD 5.3.2
    Apr 08 (Th)
    [Lec 17]
    Hypothesis testing - 4
  • Kolmogorov-Smirnov test (KS test)
  • p-values
  • AoS 15.4, 10.2
    DSD 5.3.3, 5.5
    assignment 5 out
    Required data: data_q4_1, data_q4_2, q6_X1, q6_Y1, q6_X2, q6_Y2
    assignment 4 due
    Apr 13 (Tu)
    [Lec 18]
    Hypothesis testing - 5
  • p-values
  • Permutation test
  • AoS 10.2, 10.5
    DSD 5.5
    Apr 15 (Th)
    [Lec 19]
    Hypothesis testing - 6
  • Pearson correlation coefficient
  • Chi-square test for independence
  • AoS 3.3, 10.3 - 10.4
    DSD 2.3
    Apr 20 (Tu)
    [Lec 20]
    Bayesian inference - 1
  • Bayesian reasoning
  • Bayesian inference
  • AoS 11.1 - 11.2, 11.6
    DSD 5.6

    assignment 5 due
    Apr 22 (Th)
    [Lec 21]
    Bayesian inference - 2
  • Priors
  • Conjugate priors
  • AoS 11.1 - 11.2, 11.6
    DSD 5.6
    assignment 6 out
    Required data: q2_sigma3.dat, q2_sigma100.dat, q4.csv, q5.csv, q6.csv
    Apr 27 (Tu)
    [Lec 22]
    Regression - 1
  • Basics of Regression
  • Simple Linear Regression
  • AoS 13.1, 13.3 - 13.4
    DSD 9.1
    Apr 29 (Th)
    [Lec 23]
    Regression - 2
  • Multiple Linear Regression
  • AoS 13.5
    DSD 9.1
    May 04 (Tu)
    [Lec 24]
    Time Series Analysis
  • EWMA Time Series modeling
  • AR Time Series modeling
  • May 06 (Th)
    [Lec 25]
    Project discussion assignment 6 due
    May 17 (Mon) Mid-term 2 11:15am-12:30pm Via Blackboard

    Resources

    Grading (tentative)

  • Important:
  • Academic Integrity

    Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity. Please note that any incident of academic dishonesty will immediately result in an F grade for the student.

    Critical Incident Management

    Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn.

    Student Accessibility Support Services

    If you have a physical, psychological, medical, or learning disability that may impact your course work, please contact the Student Accessibility Support Center, 128 ECC Building, (631) 632-6748, or at sasc@stonybrook.edu. They will determine with you what accommodations are necessary and appropriate. All information and documentation is confidential. https://www.stonybrook.edu/sasc.
     Please report any errors to the Instructor.