CSE 391, Spring 2019: Probability & Statistics for Data Science

News:
01/29: Schedule updated.
01/01: Piazza sign-up link.
01/01: Our first lecture will be on Jan 28th (Mon) at 4pm in Frey 217.

CSE 391: Probability & Statistics for Data Science
Spring 2019


When: Mon Wed, 4:00pm - 5:20pm
Where: Frey Hall 217

Instructor: Anshul Gandhi
Instructor Office Hours: Tue 2:30-3:30pm, Thurs 12-1pm; location: 347, New CS building

Course TA: Vamsi and Naimul
TA Office Hours: By appointment (please email the TA(s) to schedule)

Course Info

This undergraduate-level special topics course covers probability and statistics topics required for data scientists to analyze and interpret data. The course will involve theoretical topics and some programming assignments. The course is targeted primarily for junior and senior undergraduate students who are comfortable with concepts relating to probability and are comfortable with basic programming. Undergraduates from Computer Science, Applied Mathematics and Statistics, and Electrical and Computer Engineering would be well suited for taking this class. Topics covered include Probability Theory, Random Variables, Stochastic Processes, Statistical Inference, Hypothesis Testing, and Regression. The class is expected to be interactive and students are encouraged to participate in class discussions.

Grading will be on a curve, and will tentatively be based on assignments, exams, and class participation. For more details, refer to the section on grading below.

Syllabus & Schedule

Date Topic Readings Notes
Jan 28 (Mon)
[Lec 01]
Course introduction, class logistics
Jan 30 (Wed)
[Lec 02]
Probability review - 1
  • Basics: sample space, outcomes, probability
  • Events: mutually exclusive, independent
  • Calculating probability: sets, counting, tree diagram
  • AoS 1.1 - 1.5
    MHB 3.1 - 3.4
    Feb 04 (Mon)
    [Lec 03]
    Probability review - 2
  • Conditional probability
  • Law of total probability
  • Bayes' theorem
  • AoS 1.6, 1.7
    MHB 3.3 - 3.6
    assignment 1 out
    Feb 06 (Wed)
    [Lec 04]
    Random variables - 1
  • Mean, Moments, Variance
  • pmf, pdf, cdf
  • Bernoulli(p)
  • Indicator RV
  • Binomial(n, p)
  • Geometric(p)
  • AoS 2.1 - 2.3, 3.1 - 3.4
    MHB 3.7 - 3.9
    Python scripts:
    draw_Bernoulli, draw_Binomial, draw_Geometric
    Feb 11 (Mon)
    [Lec 05]
    Random variables - 2
  • Uniform(a, b)
  • Exponential(λ)
  • Normal(μ, σ2), and its several properties
  • AoS 2.4, 3.1 - 3.4
    MHB 3.7 - 3.9, 3.14.1
    Python scripts:
    draw_Uniform, draw_Exponential, draw_Normal
    Feb 13 (Wed)
    [Lec 06]
    Random variables - 3
  • Joint probability distribution
  • Linearity and product of expectation
  • AoS 2.5 - 2.7
    MHB 3.10, 3.13
    assignment 1 due
    assignment 2 out
    Feb 18 (Mon) No class Instructor traveling
    Feb 20 (Wed)
    [Lec 07]
    Probability inequalities
  • Markov's Inequality
  • Chebyshev's inequality
  • Weak Law of Large Numbers
  • Central Limit Theorem
  • AoS 4.1 - 4.2, 5.3 - 5.4
    MHB 3.14.2, 5.2
    Feb 25 (Mon)
    [Lec 08]
    Non-parametric inference - 1
  • Basics of inference
  • Empirical PMF
  • Sample mean
  • bias, se, MSE
  • AoS 6.1, 6.2, 6.3.1 Python scripts:
    sample_Bernoulli, sample_Binomial, sample_Geometric

    assignment 2 due
    assignment 3 out
    Required weather.dat dataset for A3.
    Feb 27 (Wed)
    [Lec 09]
    Non-parametric inference - 2
  • Empirical Distribution Function (or eCDF)
  • Statistical Functionals
  • Plug-in estimator
  • AoS 6.3.1, 7.1 - 7.2 Python scripts:
    sample_Uniform, sample_Exponential, sample_Normal,
    eCDF
    Mar 04 (Mon)
    [Lec 10]
    Confidence intervals
  • Percentiles, quantiles
  • Normal-based confidence intervals
  • DKW inequality
  • AoS 6.3.2, 7.1
    Mar 06 (Wed)
    [Lec 11]
    Parametric inference - 1
  • Basics of parametric inference
  • Method of Moments Estimator (MME)
  • AoS 6.3.1 - 6.3.2, 9.1 - 9.2 assignment 3 due
    Mar 11 (Mon)
    Mid-term 1 review
    Mar 13 (Wed) Mid-term 1
    Mar 25 (Mon)
    [Lec 12]
    Parametric inference - 2
  • Method of Moments Estimator (MME)
  • Properties of MME
  • AoS 9.1 - 9.2
    Mar 27 (Wed)
    [Lec 13]
    Parametric inference - 3
  • Likelihood
  • Maximum Likelihood Estimator (MLE)
  • Properties of MLE
  • AoS 9.3 - 9.4, 9.6 assignment 4 out
    Required datasets: q3.dat, acceleration, model, mpg.
    Apr 01 (Mon)
    [Lec 14]
    Hypothesis testing - 1
  • Basics of hypothesis testing
  • The Wald test
  • AoS 10 - 10.1
    DSD 5.3 - 5.3.1
    Apr 03 (Wed)
    [Lec 15]
    Hypothesis testing - 2
  • The Wald test
  • Type I and Type II errors
  • AoS 10 - 10.1
    Apr 08 (Mon)
    [Lec 16]
    Hypothesis testing - 3
  • t-test
  • Kolmogorov-Smirnov test (KS test)
  • AoS 10.10.2, 15.4
    DSD 5.3.2 - 5.3.3
    Apr 10 (Wed)
    [Lec 17]
    Hypothesis testing - 4
  • p-values
  • Permutation test
  • AoS 10.2, 10.5
    DSD 5.5
    assignment 4 due
    assignment 5 out
    Apr 15 (Mon)
    [Lec 18]
    Hypothesis testing - 5
  • Pearson correlation coefficient
  • Chi-square test for independence
  • AoS 3.3, 10.3 - 10.4
    DSD 2.3
    Apr 17 (Wed)
    [Lec 19]
    Bayesian inference - 1
  • Bayesian reasoning
  • Bayesian inference
  • AoS 11.1 - 11.2
    DSD 5.6
    Apr 22 (Mon)
    [Lec 20]
    Bayesian inference - 2
  • Bayesian inference
  • Conjugate priors
  • AoS 11.1 - 11.2
    DSD 5.6
    Apr 24 (Wed)
    [Lec 21]
    Regression - 1
  • Basics of Regression
  • Simple Linear Regression
  • AoS 13.1, 13.3 - 13.4
    DSD 9.1
    assignment 5 due
    assignment 6 out
    Required datasets: q2_sigma3.dat, q2_sigma100.dat, q5.dat.
    Apr 29 (Mon)
    [Lec 22]
    Regression - 2
  • Multiple Linear Regression
  • AoS 13.5
    DSD 9.1
    May 01 (Wed)
    [Lec 23]
    Time Series Analysis
  • Last Observed, Seasonal Last Observed
  • Simple Moving Average
  • EWMA, Holt-Winters
  • Autoregression
  • May 06 (Mon) Mid-term 2 review Assignment 6 due
    May 08 (Wed) Mid-term 2


    Resources

    Grading (tentative)

    Academic Integrity

    Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity. Please note that any incident of academic dishonesty will immediately result in an F grade for the student.

    Critical Incident Management

    Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn.

    Disability Support Services

    If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential. http://studentaffairs.stonybrook.edu/dss.
     Please report any errors to the Instructor.