ISE390: Introduction Data Science

Spring, 2019
Mon & Wed: 7:00-pm to 8:20pm.

Instructor: Martin Radfar
Email: radfar@cs.stonybrook.edu
Office hours: TUE 1pm-4pm, or by appointment. Office: CS 131
TA:
  • Name: TBD, Email: TBD, Time: TBD, Room: TBD,
  • Announcements

    Course Description

    This multidisciplinary course introduces both theoretical concepts and practical approaches to extract knowledge from data. Topics include linear algebra, probability, statistics, machine learning, and programming. Using large data sets collected from real-world problems in areas of science, technology, and medicine, we introduce how to preprocess data, identify the best model that describes the data, make predictions, evaluate the results, and finally report the results using proper visualization methods. This course also teaches state-of-the-art tools for data analysis, such as Python and its scientific libraries.

    Learning Objectives

    Students will learn what data science is, and the skills one should have to become a data scientist. These include data analysis techniques like feature selection, dealing with heterogeneous data, handling missing values in data, dimensionality reduction, and data visualization. They will also gain understanding of supervised and unsupervised learning methods, including clustering and linear regression. Students will be exposed to statistical techniques for hypothesis testing to identify statistically significant results. Finally, students will gain experience with state-of-the-art tools for data science, such as Python programming using Jupyter notebooks and associated libraries.

    Topics

    1. Introduction: what is data science?
    2. Mathematics preliminary: probability + statistics review
    3. Tools: Python and scientific libraries
    4. Data munging
    5. Statistics I: Analysis
    6. Visualization
    7. Linear algebra and dimensionality reduction
    8. Linear regression
    9. Supervised learning
    10. Unsupervised learning
    11. Using iPyhton and Jupyter notebook as interactive environments
    12. Implementing machine learning techniques, broadly speaking supervised and unsupervised learning methods and dimensionality reduction using SciKit-Learn
    13. Introduction to deep learning
    14. Natural language processing
    15. Time series analysis using Pandas
    16. Network analysis
    17. Bayesian inference and introduction to Bayesian networks

    Slides

  • Slides are posted on Blackboard in Documents. Slides are in ipynb format
  • Assignments

  • Assignments are posted on Blackboard. In total there would be three assignments.
  • Assignments should be submitted electronically.
  • Late assignment policy: For each day late - 20% of the assignment's grade (to a minimum above 0%)
  • Textbooks

    Main textbook:

    book1

    Other recommended textbooks:

    book1

    Grading Policy

    Grading numerical to letter conversion

    93-100(A),90-92(A-),87-89(B+),83-86(B),80-82(B-),77-79(C+), 73-76(C),70-72(C-),67-69(D+),63-66(D),60-62(D-),0-59(F)

    Projects:

    There will be three projects. The students need to choose one and work on the project in group of three or less. The projects are selected from different areas of science, technology and health so that the students with wide range of interest can work on their project of interest. Project1: Cancer biomarker identification using mass genomic data Project 2: Speaker Identification using supervised and unsupervised models Project 3: Stock market prediction using time series analysis and neural networks

    Project evaluation:

    The process of evaluating a project is as follows: Students complete the project in Jupyter notebook and upload their projects on Blackboard before the due date. The instructor or the TA evaluates the uploaded projects. On the day of presentation, all members of a team must attend the presentation session and if there are any questions about the project they should answer the questions. The students only present to the instructor and answer the raised questions.

    Quizzes:

    There are 4 quizzes in total. The quizzes will be administrated online. The quizzes will be posted on Blackboard (the times for quizzes will be announced upfront) and the students must upload their answers to Blackboard within the given time (usually 10-15 minutes).

    Examinations:

    There will be two exams: midterm and final. Each exam has two parts. Part I is closed book and Part II is open book. The exams will be in person and will be held at a the classroom

    Regrading:

    For re-grading of an assignment or exam, please meet with the person (instructor or teaching assistant) responsible for the grading. Please arrange a re-evaluation within one week of receiving the graded work. All such requests that are later than one week from the date the graded work is returned to the class will not be entertained. To promote consistency of grading, questions and concerns about grading should be addressed first to the TA and then, if that does not resolve the issue, to the instructor. You are welcome to contact the TA by email or come to his office hour. If you would like to speak with the TA in person, and have a schedule conflict with his office hour, you are welcome to make an appointment to meet the TA at another time.

    Academic Integrity Statement

    Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Technology & Management, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at here.

    Disability Support Services

    If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, 128 ECC Building (631) 632-6748. They will determine with you what accommodations are necessary and appropriate. All information and documentation is confidential.

    Students who require assistance during emergency evacuation are encouraged to discuss their needs with their professors and Disability Support Services. For procedures and information go to the following web site: http://www.ehs.sunysb.edu and search Fire Safety and Evacuation and Disabilities.