CSE 390, Fall 2017: Probability & Statistics for Data Science

News:
10/29: Mid-term 2 date has been added (11/30).
10/18: Syllabus & Schedule updated.
09/19: Lecture 6 slides and py scripts posted.
09/14: Lecture 5 slides and py scripts posted.
09/12: Lecture 4 slides posted.
08/29: Lecture 1 slides posted.
08/05: Our first lecture will be on Aug 29th (Tues) at 4pm in Frey 205.

CSE 390: Probability & Statistics for Data Science
Fall 2017


When: Tue Thu, 4:00pm - 5:20pm
Where: Frey Hall 205
Instructor: Anshul Gandhi
Instructor Office Hours: Tue 3-4pm and Thu 5:30-6:30pm
             347, New CS building
Course TA: Caitao Zhan, Kunal Shah
TA Office Hours: By appointment (please email the TA(s) to schedule)

Course Info

This undergraduate-level special topics course covers probability and statistics topics required for data scientists to analyze and interpret data. The course will involve theoretical topics and some programming assignments. The course is targeted primarily for junior and senior undergraduate students who are comfortable with concepts relating to probability and are comfortable with basic programming. Undergraduates from Computer Science, Applied Mathematics and Statistics, and Electrical and Computer Engineering would be well suited for taking this class. Topics covered include Probability Theory, Random Variables, Stochastic Processes, Statistical Inference, Hypothesis Testing, Regression, Classification, and Clustering. For more details, refer to the syllabus below.

The class is expected to be interactive and students are encouraged to participate in class discussions.
Grading will be on a curve, and will tentatively be based on assignments, exams, and class participation. For more details, refer to the section on grading below.

Syllabus & Schedule

Date Topic Readings Notes
Aug 29 (Tue)
[Lec 01]
Course introduction, class logistics
Aug 31 (Thu)
[Lec 02]
Probability review - 1
  • Basics: sample space, outcomes, probability
  • Events: mutually exclusive, independent
  • Calculating probability: sets, counting, tree diagram
  • AoS 1.1 - 1.6
    MHB 3.1 - 3.5
    Sep 05 (Tue) Labor Day No class
    Sep 07 (Thu)
    [Lec 03]
    Probability review - 2
  • Conditional probability
  • Law of total probability
  • Bayes' theorem
  • AoS 1.7
    MHB 3.6, 3.10 - 3.11
    assignment 1 out
    Sep 12 (Tue)
    [Lec 04]
    Random variables - 1: Overview
  • Discrete and Continuous RVs
  • Mean, Moments, Variance
  • pmf, pdf, cdf
  • AoS 2.1 - 2.3
    MHB 3.7 - 3.9
    Sep 14 (Thu)
    [Lec 05]
    Random variables - 2: Discrete RVs
  • Bernoulli(p)
  • Binomial(n, p)
  • Geometric(p)
  • Indicator RV
  • AoS 2.4
    MHB 3.7 - 3.9, 3.14.1
    Python scripts:
    draw_Bernoulli, draw_Binomial, draw_Geometric,
    sample_Bernoulli, sample_Binomial, sample_Geometric
    Sep 19 (Tue)
    [Lec 06]
    Random variables - 3: Continuous RVs
  • Uniform(a, b)
  • Exponential(λ)
  • Normal(μ, σ2), and its several properties
  • AoS 2.7
    MHB 3.14.1, 3.10, 3.13
    assignment 1 due
    assignment 2 out
    Python scripts:
    draw_Uniform, draw_Exponential, draw_Normal,
    sample_Uniform, sample_Exponential, sample_Normal
    Sep 21 (Thu) Instructor traveling No class
    Sep 26 (Tue)
    [Lec 07]
    Random variables - 4: Joint distributions & conditioning
  • Joint probability distribution
  • Linearity (and product) of expectation
  • Conditional expectation
  • Sum of a random number of RVs
  • AoS 2.8
    MHB 3.11 - 3.12, 3.15
    Sep 28 (Thu)
    [Lec 08]
    Probability inequalities - 1
  • Markov's Inequality
  • Chebyshev's inequality
  • AoS 4.1 - 4.2, 23.1 - 23.3
    MHB 3.14.2, 8.1 - 8.7
    Oct 03 (Tue)
    [Lec 09]
    Probability inequalities - 2
  • Weak Law of Large Numbers
  • Central Limit Theorem
  • A2 solutions
    AoS 4.1 - 4.2, 23.1 - 23.3
    MHB 3.14.2, 8.1 - 8.7
    assignment 2 due
    Oct 05 (Thu)
    [Lec 10]
    Mid-term 1 review
    Oct 10 (Tue) Mid-term 1
    Oct 12 (Thu)
    [Lec 11]
    Non-parametric inference - 1
  • Basics of inference
  • Simple examples
  • AoS 6.1 - 6.2
    Oct 17 (Tue)
    [Lec 12]
    Non-parametric inference - 2
  • Empirical PMF
  • Sample mean
  • bias, se, MSE
  • AoS 6.3.1
    Oct 19 (Thu)
    [Lec 13]
    Non-parametric inference - 3
  • Empirical Distribution Function (or eCDF)
  • Statistical Functionals
  • Plug-in estimator
  • AoS 7.1 - 7.2 assignment 3 out
    Required weather.dat dataset for A3.

    Python script used in class: eCDF
    Oct 24 (Tue)
    [Lec 14]
    Confidence intervals
  • Percentiles, quantiles
  • Normal-based confidence intervals
  • DKW inequality
  • AoS 6.3.2, 7.1
    Oct 26 (Thu)
    [Lec 15]
    Parametric inference - 1
  • Basics of parametric inference
  • Method of Moments Estimator (MME)
  • AoS 6.3.1 - 6.3.2, 9.1 - 9.2
    Oct 31 (Tue)
    [Lec 16]
    Parametric inference - 2
  • Method of Moments Estimator (MME)
  • Properties of MME
  • AoS 9.1 - 9.2 assignment 3 due

    assignment 4 out

    Required q4.dat dataset for A4.
    Nov 02 (Thu)
    [Lec 17]
    Parametric inference - 3
  • Likelihood
  • Maximum Likelihood Estimator (MLE)
  • Properties of MLE
  • AoS 9.3 - 9.4, 9.6
    Nov 07 (Tue)
    [Lec 18]
    Hypothesis testing - 1
  • Basics of hypothesis testing
  • The Wald test
  • AoS 10 - 10.1, 10.10.2
    DSD 5.3.1 - 5.3.2
    Nov 09 (Thu)
    [Lec 19]
    Hypothesis testing - 2
  • The Wald test
  • t-test
  • Kolmogorov-Smirnov test (KS test)
  • AoS 15.4
    DSD 5.3.3
    assignment 4 due
    assignment 5 out

    Required q3_X.dat, q3_Y.dat, and gamma.dat datasets for A5.
    Nov 14 (Tue)
    [Lec 20]
    Hypothesis testing - 3
  • p-values
  • Permutation test
  • AoS 10.2, 10.5
    DSD 5.5
    Nov 16 (Thu)
    [Lec 21]
    Bayesian inference
  • Bayesian reasoning
  • Bayesian inference
  • AoS 11.1 - 11.2
    DSD 5.6
    Nov 21 (Tue)
    [Lec 22]
    Regression - 1
  • Basics of Regression
  • Simple Linear Regression
  • A5 solutions
    AoS 13.1, 13.3 - 13.4 assignment 5 due
    assignment 6 out, due Dec 11, to Caitao Zhan.

    Required q2.dat dataset for A6.
    Nov 23 (Thu) Thanksgiving Break No class.
    Nov 28 (Tue)
    [Lec 23]
    Mid-term 2 review
    Nov 30 (Thu) Mid-term 2
    Dec 05 (Tue)
    [Lec 24]
    Regression - 2
  • Multiple Linear Regression
  • AoS 13.5
    Dec 07 (Thu) Instructor traveling

    Resources

    Grading (tentative)

    Academic Integrity

    Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity. Please note that any incident of academic dishonesty will immediately result in an F grade for the student.

    Critical Incident Management

    Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn.

    Disability Support Services

    If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential. http://studentaffairs.stonybrook.edu/dss.
     Please report any errors to the Instructor.