CSE 544, Fall 2018: Probability & Statistics for Data Science

News:
09/10: Thursday OH poll is on piazza.
09/05: Piazza sign-up link.

CSE 544: Probability & Statistics for Data Science
Fall 2018


When: Mon Wed, 4:00pm - 5:20pm
Where: Javits 111
Instructor: Anshul Gandhi
Instructor Office Hours: Mon 5:30-6:30pm; Thur 4-5pm
             347, New CS building
Course TA and Graders: Abhinav Jain, Swethasri Kavuri, Parth Limbachiya, Ankit Sabharwal, Gagan Somashekar, Amogha Suresh

Course Info

This grad-level course covers probability and statistics topics required for data scientists to analyze and interpret data. The course is also part of the Data Science and Engineering Specialization. The course is targeted primarily at PhD and Masters students in the Computer Science Department. Topics covered include Probability Theory, Random Variables, Stochastic Processes, Statistical Inference, Hypothesis Testing, Regression, and Time Series Analysis. For more details, refer to the syllabus below.

The class is expected to be interactive and students are encouraged to participate in class discussions.

Grading will be on a curve, and will be based on assignments, exams, and a semester-end mini project. For more details, see the section on grading below.

Syllabus & Schedule

Date Topic Readings Notes
Aug 27 (Mon)
[Lec 01]
Course introduction, class logistics
Aug 29 (Wed)
[Lec 02]
Probability review - 1
  • Basics: sample space, outcomes, probability
  • Events: mutually exclusive, independent
  • Calculating probability: sets, counting, tree diagram
  • AoS 1.1 - 1.6
    MHB 3.1 - 3.5
    Sep 03 (Mon) Labor Day observed No class
    Sep 05 (Wed)
    [Lec 03]
    Probability review - 2
  • Conditional probability
  • Law of total probability
  • Bayes' theorem
  • AoS 1.7
    MHB 3.6, 3.10 - 3.11
    assignment 1 out
    Sep 10 (Mon)
    [Lec 04]
    Random variables - 1: Overview and Discrete RVs
  • Discrete and Continuous RVs
  • Mean, Moments, Variance
  • pmf, pdf, cdf
  • Discrete RVs: Bernoulli, Binomial, Geometric, Indicator
  • AoS 2.1 - 2.4
    MHB 3.7 - 3.9, 3.14.1
    Sep 12 (Wed)
    [Lec 05]
    Random variables - 2: Continuous RVs
  • Uniform(a, b)
  • Exponential(λ)
  • AoS 2.7
    MHB 3.14.1, 3.10, 3.13
    Python scripts:
    draw_Bernoulli, draw_Binomial, draw_Geometric
    Sep 17 (Mon)
    [Lec 06]
    Random variables - 3: The Normal distribution
  • Normal(μ, σ2), and its several properties
  • AoS 2.7
    MHB 3.14.1, 3.10, 3.13
    Python scripts:
    draw_Uniform, draw_Exponential, draw_Normal
    Sep 19 (Wed)
    [Lec 07]
    Random variables - 4: Joint distributions & conditioning
  • Joint probability distribution
  • Linearity (and product) of expectation
  • Conditional expectation
  • Sum of a random number of RVs
  • AoS 2.8
    MHB 3.11 - 3.12, 3.15
    assignment 2 out
    assignment 1 due
    Sep 24 (Mon)
    [Lec 08]
    Probability inequalities
  • Markov's Inequality
  • Chebyshev's inequality
  • Weak Law of Large Numbers
  • Central Limit Theorem
  • AoS 4.1 - 4.2, 23.1 - 23.3
    MHB 3.14.2, 8.1 - 8.7
    Sep 26 (Wed)
    [Lec 09]
    Markov chains
  • Stochastic processes
  • Setting up Markov chains
  • Balance equations
  • AoS 4.1 - 4.2, 23.1 - 23.3
    MHB 3.14.2, 8.1 - 8.7
    Oct 01 (Mon)
    [Lec 10]
    Non-parametric inference - 1
  • Basics of inference
  • Simple examples
  • Empirical PMF
  • Sample mean
  • bias, se, MSE
  • AoS 6.1 - 6.2, 6.3.1 Python scripts:
    sample_Bernoulli, sample_Binomial, sample_Geometric,
    sample_Uniform, sample_Exponential, sample_Normal
    Oct 03 (Wed)
    [Lec 11]
    Non-parametric inference - 2
  • Empirical Distribution Function (or eCDF)
  • Kernel Density Estimation (KDE)
  • Statistical Functionals
  • Plug-in estimator
  • AoS 7.1 - 7.2 assignment 2 due
    Oct 08 (Mon) Fall break No class
    Oct 10 (Wed) Mid-term 1 This will be in-class, closed notes, closed book.
    Oct 15 (Mon)
    [Lec 12]
    Confidence intervals
  • Percentiles, quantiles
  • Normal-based confidence intervals
  • DKW inequality
  • AoS 6.3.2, 7.1 assignment 3 out
    Required data q8.dat
    Oct 17 (Wed)
    [Lec 13]
    Parametric inference - 1
  • Consistency, Asymptotic Normality
  • Basics of parametric inference
  • Method of Moments Estimator (MME)
  • AoS 6.3.1 - 6.3.2
    Oct 22 (Mon)
    [Lec 14]
    Parametric inference - 2
  • Properties of MME
  • Basics of MLE
  • Maximum Likelihood Estimator (MLE)
  • Properties of MLE
  • AoS 9.1 - 9.4, 9.6
    Oct 24 (Wed) Instructor traveling No class
    Oct 29 (Mon)
    [Lec 15]
    Hypothesis testing - 1
  • Basics of hypothesis testing
  • The Wald test
  • AoS 10 - 10.1
    DSD 5.3.1
    assignment 4 out
    Required data: q5_sigma3.dat, q5_sigma100.dat, q7_X.dat, q7_Y.dat
    Oct 31 (Wed)
    [Lec 16]
    Hypothesis testing - 2
  • t-test
  • Kolmogorov-Smirnov test (KS test)
  • AoS 10.10.2, 15.4
    DSD 5.3.2
    assignment 3 due
    Nov 05 (Mon)
    [Lec 17]
    Hypothesis testing - 3
  • p-values
  • Permutation test
  • AoS 10.2, 10.5
    DSD 5.3.3, 5.5
    Nov 07 (Wed)
    [Lec 18]
    Bayesian inference
  • Bayesian reasoning
  • Bayesian inference
  • Priors
  • Conjugate priors
  • AoS 11.1 - 11.2, 11.6
    DSD 5.6
    Nov 12 (Mon)
    [Lec 19]
    Regression - 1
  • Basics of Regression
  • Simple Linear Regression
  • AoS 13.1, 13.3 - 13.4
    DSD 9.1
    assignment 4 due
    assignment 5 out
    Required data A5_q2.dat, A5_q5.dat, A5_q6.dat.
    Nov 14 (Wed)
    [Lec 20]
    Regression - 2, Mini-project discussion
  • Multiple Linear Regression
  • AoS 13.5
    DSD 9.1
    assignment 6 out
    due Dec 7th, 1pm (to Amogha, NCS 336)
    Nov 19 (Mon) Mid-term 2 This will be in-class, closed notes, closed book.
    Nov 21 (Wed) Thanksgiving break No class
    Nov 26 (Mon)
    [Lec 21]
    Time Series Analysis
  • EWMA Time Series modeling
  • AR Time Series modeling
  • Nov 28 (Wed)
    [Lec 22]
    Mini-project discussion (finalize hypothesis) assignment 5 due

    Resources

    Grading (tentative)

    Academic Integrity

    Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity. Please note that any incident of academic dishonesty will immediately result in an F grade for the student.

    Critical Incident Management

    Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn.

    Disability Support Services

    If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential. http://studentaffairs.stonybrook.edu/dss.
     Please report any errors to the Instructor.