Course Information

Class Description

CSE 549 will cover commonly used machine learning algorithms and their applications to computational biology. The class is structured so that problem motivates the application of the methods. Problems are divided into sections according to corresponding types of data: sequence, matrix, graphs, and 3D structure. In each of the sections problems will be described, then example machine learning method used to solve the problem will be discussed. We will learn about entropy, relative entropy, and mutual information in the context of solving DNA-binding site identification; learn about mixture models in the context of finding nucleosome positions; learn about graph structure learning in context of gene network construction; learn about graph searching in context of biomolecule searching; learn about feature selection in biomarker discovery; and feature extraction in context of protein searching. The class will involve combination of book & slides to describe the problems and machine learning methods and paper reading to see how it is actually applied. There will be a midterm exam, a final exam, and a semester project of your choosing.


Assistant Professor Sael Lee
Office: Academic Bldg. B422
Email: sael at sunykorea dot ac dot kr
Phone: +82 (32) 626-1215

Meeting Time

[lecture] Mon/Wed 16:00~17:20 Academic Bldg. B204

Office Hours

Office Hours: Wed. 13:00-14:30 & Wed. 17:30-18:00 (or send emails for appointments) at B422




Required: N.A.

Pattern Recognition and Machine Learning, 2007, C.M. Bishop
Information Theory, Inference and Learning Algorithms, D. MacKay
Elements of Information Theory. T.M. Cover, J.A.
Thomas Bioinformatics: Sequence and Genome Analysis, David W. Mount
Introduction to Bioinformatics, 2008, A. M Lesk


Final Exam will be worth 50% of your grade.
Project will be worth 50% of your grade.


The final project will not be limited to topics in computational biology, but will require you to apply methods and ideas that have been discussed in class. (proposal 10% + report 35% + presentation 5% = 50%)


Course Materials

1 8/31 Introduction: Whys and whats of computational biology Slide01
9/2 Defining the problem: Information Content in Biology and DNA Binding F. Fabris JIM 2009 Slide02
2 9/7 Method 1: Entropy, relative entropy and mutual information Ch.2 of Elements of Info. Theory Slide03
9/9 Method 1 Cont. : Entropy, relative entropy and mutual information Slide03
3 9/14 Example Solutions: DNA-binding site identification using information theory TD. Schneider Nano Commun Netw. 2010
Erill and O'Neill BMC Bioinformatics 2009
9/16 Project description and QA
Defining the problem: Finding nucleosome positions
project doc
Jiang C, Pugh BF. Nat Rev Genet. 2009
Slide05; Slide06;
4 9/21 Method 2: Mixture models Slide07
9/23 Method 2: Mixture models Chapter 9 of PRML Slide08
5 9/28 NO CLASS: National Holiday
9/30 NO CLASS: Correction Day
6 10/5 Method 2 Cont.: Mixture models Slide09
10/7 Example Solutions: Finding nucleosome positions using mixture models Polishko et al. Bioinformatics. 2012 Slide10
7 10/12 Defining the problem: Biomarker discovery Slide11
10/14 Method 3: Feature selection Chapter 3&7 PRML Slide12-13
8 10/19 Method 3 Cont.: Feature selection
10/21 Example Solutions: Biomarker discovery by feature selection Abeel et al. Bioinformatics. 2010 I. Guyon et al. JML 2002 Slide14 Proposal Due
10/26 Defining the problem:Protein Structure and Dynamics Slide15
9 10/28 Method 4: Feature Extraction: PCA & Kernel PCA Chapter 12 of PRML Slide16
10 11/2 Method 4 Cont.: Feature Extraction: Slide17
11/4 Example Solutions: Protein Dynamics with Feature Extraction Bakan A, Bahar I. PNAS 2009 Slide18
11 11/9 Problem/Method 5: Computational pharmacology and biomolecule searching N. Brown ACM Computing Surveys(CSUR)2009 Kashima et al. ICML2003 Mahe et al ICML2004 Ceroni et al. Bioinformatics 2007 Slide19 Slide20 Slide21 Slide22
11/11 Problem/Method 6: Bio-network construction Dr. Przulj's slides
12 11/16 P/M 6 cont:Bio-network construction Dr. Ka Yee Yeung's Slides (paper); Dr. Carlo Cosentino's Slides
11/18 P/M 7:Traditional Sequence Alignment Slide25
13 12/23 P/M 7 cont.: Traditional Sequence Alignment Slide26
11/25 P/M 7 cont.: Traditional Sequence Alignment Slide27
11/30 Problem/Method 8: DNA sequencing Stanford CS262 Slides
12/2 Problem/Method 8: Genome-scale sequence alginment M. Schatz's slides 1 M. Schatz's slides 2 .
15 12/7 PROJECT PRESENTATION & Review Harley Jackson; Ayush Kumar Presentation
12/9 PROJECT PRESENTATION & Review Vasundhara Dehiya; Yongjin Park project report deadline
12/16 FINALS EXAM: (15:30-17:30)

Course Policy

Attendance policy

Everyone is strongly urged to attend class regularly and actively participate. You will be responsible for learning all the materials covered in class. Notes and supplementary handouts will cover most of the material; however, in-class participation through engaging in discussions and asking questions should be valued learning activity.

Assignments grading policy

You will be required to propose and execute a final project based on the contents we will learn in class. The class grading will be based on 10% of the content of the proposal, 25% on the final report, and 5% project presentation which add up to 60% of your grade. SUNY-SB Blackboard facility will be used for submissions. The Blackboard facility will mark your time of submission. It is your responsibility to check if the uploads are done properly and to check if you received a proper grade. Grades will be e-mailed to you individually in a timely fashion.

Academic misconduct policy

There is no excuse in cheating. Cheating will be considered as an academic misconduct and handled according to the Stony Brook regulations. If cheating has occurred during exam or is evident in submitted assignments, your will get a grade of F. Discussion of assignments is acceptable, however, returned assignments must show originality. This means near duplicate assignments with your peers or duplications of materials found on the web will be considered cheating. All involved personals in cheating will be penalized.

University Policy

Americans with Disabilities Act

If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC(Educational Communications Center) Building, Room 128, (631)632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential.Disability Support Services.

Academic Integrity

Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty is required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Technology & Management, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty please refer to the academic judiciary website at Academic Judiciary

Critical Incident Management

Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of University Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Faculty in the HSC Schools and the School of Medicine are required to follow their school-specific procedures. Further information about most academic matters can be found in the Undergraduate Bulletin, the Undergraduate Class Schedule, and the Faculty-Employee Handbook.