CSE392 (Fall 2014)
Data Mining

[ General Information | Schedule | Handouts | Resources | Requirements ]

General Information

Course description: Data mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, clustering, and classification. Emphasis will be laid on the algorithmic approach. After taking this course students will be able to

Prerequisites: Discrete math, data structures and algorithms. Knowledge of linear algebra and of probablity and statistics is also needed, though an attempt will be made to review the basic concepts. Assignments will require the use of the Python language, with NumPy package for numeric computations, though an overview of important features will be given to start you learning it on your own via web tutorials, etc. | Credits: 3.

Instructor: Annie Liu | Email: liu@cs.stonybrook.edu | Office: Computer Science 1433 | Phone: 631-632-8463. | Office hours: Tue 9:30-10AM, 11AM-12PM, 12:40-1PM, Thu 9:30-10AM, 12:30-1PM, 2:20-2:30PM, email for an appointment, or stop by any time I'm around.

Lectures: Tue Thu 1-2:20PM, in Computer Science 2129.

Textbook: Data Mining and Analysis by Mohammed J. Zaki and Wagner Meira, Jr. Cambridge University Press, 2014. Thanks to Prof. Zaki for providing much of the course materials!

Grading: Lecture critiques, in-class exercises, programming assignments, 2 exams, and a project, worth 5%, 10%, 30%, 2 x 20%, and 15%, respectively, of the grade. Extra credit work will be given as appropriate. Partial credit will be given for partial work. Reduced credit for late assignments, 20% off per day.

Course homepage: http://www.cs.stonybrook.edu/~liu/cse392/


Unit 1 (Aug 26,28): Overview. Ch.1. Assignment 0

No class: Sep 2, Labor Day

Unit 2 (Sep 4,9,11,16,18,23): Foundation. Ch.2,3,4,6,5,7. Assignment 1,2

Exam 1 (Sep 25): In-class exam. You can prepare one hand-written personal "crib sheet".

No class: Sep 30, Oct 2

Unit 3 (Oct 7,9,14,16,21): Classification. Ch.20,21,18,19,22. Assignment 3,4

No class: Oct 23

Unit 4 (Oct 28,30, Nov 4,6,11): Clustering. Ch.13,14,15,16,17. Assignment 5

Exam 2 (Nov 13): In-class exam. You can prepare one hand-written personal "crib sheet".

Unit 5 (Nov 18,20,25, Dec 2,4): Frequent pattern mining. Ch.8,9,10,11,12. Project

No class: Nov 27, Thanksgiving

Project due (Dec 5)



Lecture Critiques

In-Class Exercises

Assignment 0: Data mining problem, programming in Python

Assignment 1: Numeric data analysis

Assignment 2: High dimensions, kernel method, principal components

Assignment 3: SVM training

Assignment 4: Paired t-test and Bayes classifier

Assignment 5: Expectation-maximization clustering

Exam 1

Exam 2



Interactive Site of This Course, for students in the class

Computer Science Department Windows Computing Facilities


Learn all information on the course homepage. Check the homepage periodically for announcements and other dynamic contents.

Attend all lectures and take good notes. This is the most efficient way to learn the course materials, because we will both distill and elaborate textbook materials and discuss other important materials. We will start promptly on time, with quick reviews every time, followed by exercises or quizzes. We will have every student participate in solving problems and presenting solutions in class.

Do all course work. The readings are to help you preview and review the materials discussed in the lectures. The assignments are to provide concrete experiences with the basic concepts and methods covered in the lectures. The exercises and quizzes are to help check that you are keeping up with the lectures and the assignments. The exams will be comprehensive.

Your handins, whether in electronic form or on paper, should include the following information at the top: your name, student id, course number, assignment number, and due date, and should be submitted in a neat and organized fashion.

Your programming assignments should always be submitted with a README.txt file explaining where things are, what you did and found for the assignment (that is not described in the assignment handout), and how to run and test your code. This file is worth a non-trivial portion of the grade.

Your approach to solving problems is as important as your final solutions; you need to show how you arrived at your solutions and include appropriate explanations. Always include good explanations in your README file and good comments in your code.

If you feel your grade was assigned incorrectly, please bring it up no later than two weeks after the assignment was returned to the class.

Ask questions and get help. Ask questions in class, in office hours, and in the Q&A forum. Talk with your classmates, and share ideas (but nothing written or electronic).

Academic Integrity: All assignments, quizzes, and exams must be done individually, unless specified otherwise; you may discuss ideas with others and look up references, but you must write up your solutions independently and credit all sources that you used. Any plagiarism or other forms of cheating discovered will have a permanent consequence in your university record.

Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/

Americans with Disabilities Act: If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC(Educational Communications Center) Building, Room 128, (631)632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential.

Critical Incident Management: Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of University Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Further information about most academic matters can be found in the Undergraduate Bulletin, the Undergraduate Class Schedule, and the Faculty-Employee Handbook.