CSE 392 - Fall 2011

Computers playing Jeopardy!

http://www.cs.stonybrook.edu/~cse392


Links


Course Information

Instructor: Dr. Paul Fodor
1437 Computer Science Building
Office hours: We 12:00-2:00PM, Th 1:00-2:00PM and By Appointment
Phone: (631) 632-9820
Email: pfodor (at) cs (dot) stonybrook (dot) edu

Course Description

This 3 credits class is about the IBM Watson project. IBM Watson is a computer system capable of answering rich natural language questions and estimating its confidence in those answers at a level of the best humans at the task. On Feb 14-16, in an televised event, Watson triumphed over the best human players of all time on the American quiz show, Jeopardy!. In this course we will discuss the main principles of natural language processing, computer representation of knowledge and discuss how Watson solved some of its answers (right and wrong).

Course Syllabus


Week Date Lecture Topics/Notes Assignments/Readings HW/Labs Due
1 Tu 8/30 Administrative (course information) + Computerized Jeopardy! Read UIMA tutorial due Th 9/1 Execute UIMA Tutorial here).
Th 9/1 Computerized Jeopardy! (cont.) UIMA cont. Develop UIMA Room AE 1
2 Tu 9/6 UIMA Introduction UIMA cont. n/a
Th 9/8 UIMA cont.
Running the document analyzer
Writing a first UIMA annotator
UIMA cont. n/a
F 9/9 Add/Swap Deadline, Drop with a W n/a n/a
3 Tu 9/13 Prolog XSB manual Implement examples from the slides
Th 9/15 Prolog (cont.) Prolog list predicates: member, append, select Implement examples from the slides
4 Tu 9/19 Prolog Wordnet and Prolog DCG grammars Wordnet: http://wordnet.princeton.edu Implement recursive synset hierarchy
Th 9/22 DCG grammars UIMA Prolog Annotator Combine Wordnet with DCG grammars
5 Tu 9/27 Probability theory, algorithms and NLP applications Experiment with Prolog Viterbi, WordNet Prolog interface Connect Prolog to UIMA Java: InterProlog (XSB), JPL (SWI)
Th 9/29 NO LECTURE - Rosh Hashanah n/a n/a
6 Tu 10/4 Text search and indexing Lucene: http://lucene.apache.org, Hadoop: http://hadoop.apache.org

http://www.lucenetutorial.com/lucene-in-5-minutes.html, http://developer.yahoo.com/hadoop/tutorial/module3.html

Th 10/6 Text search and indexing, cont. Discussion papers n/a
7 Tu 10/11 OpenNLP Blackboard n/a
Th 10/13 System presentation: ProNTo = Prolog Natural Language Tools, Tokenizer Blackboard n/a
8 Tu 10/18 System presentation: ProNTo = Prolog Natural Language Tools, Prolog statistical toolkit. Blackboard n/a
Th 10/20

Paper discussion: The lie detector: Explorations in the automatic recognition of deceptive language, Mihalcea and Strapparava, ACL-IJCNLP 2009.

System presentation: ProNTo = Prolog Natural Language Tools, A Free-Word-Order Dependency Parser in Prolog, Covington.

Blackboard n/a
9 Tu 10/25

Software presentation: Twitter API

System presentation: ProNTo = Prolog Natural Language Tools, SCP: A Simple Chunk Parser, Brooks.

Blackboard n/a
Th 10/27

Software presentation: Google+ API

System presentation: ProNTo = Prolog Natural Language Tools, ProNTo_Morph: Morphological Analysis Tool, Schlachter.

Blackboard n/a
10 Tu 11/1 Paper discussion: Improving a Statistical MT System with Automatically Learned Rewrite Patterns. Fei Xia, Michael McCord, Proceedings of the 20th International Conference on Computational Linguistics, 2004. Blackboard n/a
Th 11/3 Paper discussion: Detecting controversial events from twitter, A. Popescu and M. Pennacchiotti, CIKM 2010. Blackboard n/a
11 Tu 11/8 Paper discussion: Semi-supervised recognition of sarcastic sentences in Twitter and Amazon, Davidov, Tsur and Rappoport, CoNLL 2010. Blackboard n/a
Th 11/10

Paper discussion: Twitter as a Corpus for Sentiment Analysis and Opinion Mining, Pak and Paroubek.

Paper presentation: Using Slot Grammar. Michael McCord. IBM Research Report, 2010.

Blackboard n/a
12 Tu 11/15 Paper discussion: Twitter power: Tweets as electronic word of mouth, Jansen, Zhang, Sobel and Chowdury, Journal of the American Society for Information Science and Technology, 2009. Blackboard n/a
Th 11/17

Paper discussion: Blogs Are Echo Chambers: Blogs Are Echo Chambers, Gilbert, Bergstrom, Karahalios, System Sciences 2009.

Paper discussion: Measuring user influence in Twitter: The million follower fallacy. Cha, Haddadi, Benevenuto and Gummadi. ICWSM 2010.

Blackboard n/a
13 Tu 11/22

Paper discussion: How opinions are received by online communities: A case study on Amazon.com helpfulness votes, Mizil, Kossinets, Kleinberg and Lee, WWW 2009.

Paper discussion: What Is Twitter, a Social Network or a News Media? Kwak, Lee, Park and Moon. WWW 2010.

Blackboard n/a
Th 11/24 NO LECTURE - THANKSGIVING Blackboard n/a
14 Tu 11/29

System presentation: Weka

Paper discussion: Speak little and well: recommending conversations in online social streams, Chen, Nairn and Chi.CHI 2011.

Blackboard n/a
Th 12/1 Paper discussion: Web-Scale N-gram Models for Lexical Disambiguation. Shane Bergsma, Dekang Lin, Randy Goebel. In Proc. IJCAI 2009. Blackboard n/a
15 Tu 12/6 Paper discussion: A latent variable model for geographic lexical variation, Eisenstein, O'Connor, Smith, and Xing, EMNLP 2010. Blackboard n/a
Th 12/8

Paper discussion: How opinions are received by online communities: A case study on Amazon.com helpfulness votes. DNM, Kossinets, Kleinberg and Lee. WWW 2009.

Blackboard n/a
Final Tu 12/20 NO FINAL EXAM n/a n/a

Grading Schema

Grades will be based on homework and lab work. The P/NC grading option is not available for this course.

The grades are posted on Blackboard: http://blackboard.stonybrook.edu.


Laboratory/Classroom

Information about the laboratory room (classroom) is available at the Computer Science Department Windows Computing Facilities website. Click on the FAQs link for information about accounts. Each student is encouraged to use an Integrated Development Environment (IDE) for software development and debugging. Eclipse and Netbeans are available in the laboratory and can be installed for free on your own computer. Each student can use a version control system, such as CVS or Subversion, to manage his files. Netbeans and Eclipse both support CVS and Subversion, either natively or via a plug-in. Or you can use a stand-alone GUI client, such as TortoiseCVS or TortoiseSVN, or a command-line cvs or svn client. The laboratory has a Subversion server (click on Services > SVN for details). If you need to use any DBMS for the class assignments, you may run the DBMS yourself, or you may use the MySQL, Oracle, or DB2 server in the laboratory (click on Services for details).

Resources


Academic Integrity

Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Suspected instances of academic dishonesty will be reported to the CEAS Committee on Academic Standing and Appeals. More comprehensive information on academic integrity, including categories of academic dishonesty, can be found on the Academic Judiciary's web site.

Additional Information

If you have a physical, psychological, medical, or learning disability that may impact your course work, please contact Disability Support Services at (631) 632-6748 or http://studentaffairs.stonybrook.edu/dss. They will review your concerns and determine, with you, what accommodations are necessary and appropriate. All information and documentation is confidential.
Students who require assistance during emergency evacuation are encouraged to discuss their needs with their professors and Disability Support Services. For procedures and information go to the following web site: http://www.ehs.stonybrook.edu and search Fire Safety and Evacuation and Disabilities.

Critical Incident Management

Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn.

Page maintained by Paul Fodor