CSE 519  Data Science
Fall 2018
Data Science is a rapidly emerging discipline at the intersection of statistics, machine learning, data visualization, and mathematical modeling. This course is designed to provide a handson introduction to Data Science by challenging student groups to build predictive models for upcoming events, and validating their models against the actual outcomes.

Course Time: 8:309:50AM Tuesday and Thursday
Place: 102 Frey Hall

Steven Skiena's office hours are 2:30AM4PM TuesdayThursday,
in 251 New Computer Science, and by appointment.

The course teaching assistants will be:

Allen Kim.
His email address is
allen.kim@stonybrook.edu.
He will have office hours Monday and Wednesday 2:30PM4PM
in room 340 New Computer Science.

Harsh Agarwal.
His email address is
hagarwal@cs.stonybrook.edu.
He will have office hours Tuesday and Thursday 10AM11:30AM
in the TA rooms 2203 and 2217 in Old Computer Science.

Shouvik Roy.
His email address is
shroy@cs.stonybrook.edu.
He will have office hours Tuesday and Wednesday 45:30PM
in the TA room 2206 Old Computer Science.

Raveendra Soori.
raveendra.soori@stonybrook.edu.
He will have office hours Tuesday and Thursday form 1:303PM
in the TA rooms 2203 and 2217 in Old Computer Science.

Sahil Sobti.
ssobti@cs.stonybrook.edu.
He will have office hours XXXX
in the TA rooms 2203 and 2217 in Old Computer Science.

Videos and slides from my Fall 2016 lectures is available here.
The video from Fall 2017 appears
here , but the quality is not good.
The best stuff should always be available at www.datamanual.com.

Sign up for the Piazza class discussion board at https://piazza.com/stonybrook/fall2018/cse519.

Syllabus

Lecture Schedule
Textbook
We will use my new book The Data Science Design Manual, SpringerVerlag, 2017.The associated website www.datamanual.com points to many resources, including lecture notes/videos, errata, a problem solution Wiki, and sample Python notebooks for generating figures from the book.
I will welcome feedback on the book.
Please keep track of errata in the book send them to me, ideally in one batch at the end of the semester.
Homework Assignments
Lecture Notes
I will give about 25 formal lectures this semester.
All classes will be filmed by Echo360 and made available on Blackboard.
Ritika Nevatia is graciously making the lecture notes she
takes in class available to all interested students. Check them out.
Old lecture notes are available from the previous offering in Fall 2014.
Semester Projects
Roughly half of the course grade will come from a course project.
Students will typically work in small groups (23 people) on independent research projects.
I will distribute a list of possible projects about six weeks into the semester.
You will be encouraged to develop your own project ideas, although I must approve.
Recommended Readings
The field of data science is still emerging, but
there are several books which it will be useful to read and consult:

Python for Data Analysis, by Wes McKinney, O'Reilly Media, 2013 
This book is a nuts and bolt's guide to data wrangling with Python,
including such tools/libraries as Pandas, NumPy, and IPython.
You will be expected to use these tools in doing your course project.

The Signal and the Noise: Why so many predictions fail but some don't, by Nate Silver, Penguin Press, 2012 
This popular, easytoread book focuses on how effectively data can
be used to make predictions in domains like sports, science, economics,
and politics.
This is exactly what we are trying to do in this course, and Silver's
book is an excellent model to build on.
Videos: The Quant Shop
The Quant Shop is a series of eight 30 minute
programs on Data Science, which are a product of the Fall 2014
offering of this course.
Watch them for inspiration at
the Quant Shop Vimeo channel.
Related Links
Professor
Steven S. Skiena
251 New Computer Science Building
Department of Computer Science
Stony Brook University
Stony Brook, NY 117942424, USA
skiena@cs.stonybrook.edu
6316329026