CSE 519 - Data Science

Fall 2024

Data Science is a rapidly emerging discipline at the intersection of statistics, machine learning, data visualization, and mathematical modeling. This course is designed to provide a hands-on introduction to Data Science by challenging student groups to build predictive models for upcoming events, and validating their models against the actual outcomes.

Textbook

We will use my book The Data Science Design Manual, We will use my book The Data Science Design Manual, We will use my book The Data Science Design Manual, We will use my book The Data Science Design Manual, Springer-Verlag, 2017.The associated website www.data-manual.com points to many resources, including lecture notes/videos, errata, a problem solution Wiki, and sample Python notebooks for generating figures from the book.

I will welcome feedback on the book. Please keep track of errata in the book send them to me, ideally in one batch at the end of the semester.

Homework Assignments

Lecture Notes

I will give about 25 formal lectures this semester. All classes will be recorded by Zoom and made available on Blackboard.

Ritika Nevatia made lecture notes she took in class one year available to all interested students. You may check them out if you wish.

Old lecture notes are available from the previous offering in Fall 2014.

Short Course on Computational Social Science

I taught a minicourse on machine learning and NLP for social scientists at the European University Institute (EUI) in Florence, Italy in November 2022. This course was largely (but not completely) based on my slides from CSE 519. I give my lecture slides from this course below.

Short Course on Word and Graph Embeddings

I taught a minicourse on Word and Graph Embeddings at BigDat 2023 on Gran Canary in Spain's Canary Islands. I give the links to my lecture videos and slides below.