Data Science is a rapidly emerging discipline at the intersection of statistics, machine learning, data visualization, and mathematical modeling. This course is designed to provide a hands-on introduction to Data Science by challenging student groups to build predictive models for upcoming events, and validating their models against the actual outcomes.
We will use my book
The Data Science Design Manual,
Springer-Verlag, 2017.The associated website www.data-manual.com points to many resources, including lecture notes/videos, errata, a problem solution Wiki, and sample Python notebooks for generating figures from the book.
I will welcome feedback on the book. Please keep track of errata in the book send them to me, ideally in one batch at the end of the semester.
Here are three examples of good project proposals from a previous year: on music popularity, legal notice analysis and fundraising. Pick one as a model appropriate for your project.
Here are three examples of good project progress reports from a previous year: on music popularity, academic paper ranking, and legal notice analysis. Pick one as a model appropriate for your project -- that said I really want you to limit progress reports to five pages at most.
Here are two examples of good final project reports from a previous year: on academic paper ranking and legal notice analysis. Pick one as a model appropriate for your project but keep it to at most eight pages plus references!
I will give about 25 formal lectures this semester. All classes will be recorded by Zoom and made available on Blackboard.
Ritika Nevatia made lecture notes she took in class one year available to all interested students. You may check them out if you wish.
Old lecture notes are available from the previous offering in Fall 2014.
Roughly half of the course grade will come from a course project. Students will typically work in small groups (2-3 people) on independent research projects. I will distribute a list of possible projects about six weeks into the semester. You will be encouraged to develop your own project ideas, although I must approve.
The field of data science is still emerging, but there are several books which it will be useful to read and consult:
The Quant Shop is a series of eight 30 minute programs on Data Science, which are a product of the Fall 2014 offering of this course. Watch them for inspiration at the Quant Shop Vimeo channel.
Steven S. Skiena 251 New Computer Science Building Department of Computer Science Stony Brook University Stony Brook, NY 11794-2424, USA skiena@cs.stonybrook.edu 631-632-9026