Natural Language Processing covers a broad range of techniques that aim to read, understand, and extract
information present in natural language. Web technology giants (Google, Microsoft etc.), social media companies (Twitter, Facebook etc.), retailers (Amazon, eBay, etc.),
and most companies with any kind of web presence rely on NLP capabilities. Some of the high visibility AI applications such as IBM Watson, Siri, also
depend on NLP. It is a great time to learn NLP!
Syllabus
This introductory course will cover some of the basic applications in NLP. We will look at
the tools and techniques used in these applications. Tentatively, the course will cover the following topics:
Language Modeling -- Guessing the next word in a sequence.
Text Categorization -- Determining the type of content (e.g. sports, politics) in a document.
Part-Of-Speech Tagging -- Determining the part-of-speech of each word in a sentence.
Named Entity Recognition -- Finding and classifying entities (e.g., Persons, Locations).
Information Extraction -- Extracting relationships e.g., president-of(Obama, US).
Information Retrieval -- Finding documents relevant to a query.
Question Answering -- Finding answers to questions.
Course Structure
Programming Assignments (25%)
Mid-term (25%)
Final Exam (25%)
Project (25%)
Pre-requisites
Programming -- This will be a programming heavy course. 50% of the grade is for programming assignments and project.
You will implement some NLP techniques yourself, as well as use off-the-shelf tools.
You must have basic programming skills in a high-level language such as C++, Java or Scala AND
scripting languages such as Python or Perl.
Data Structures and Algorithms -- You should be familiar with basic data-structures such as Linked lists, Arrays, and Hashmaps.
You should have taken a basic course in algorithms that covered topics such as sorting, dynamic programming, and complexity analysis.
Probability and Statistics -- Familiarity with basic probability concepts is recommended but not required.
I will provide necessary learning materials and notes that you can refer to before the start of the semester to pick up
on these concepts, if you don't have the necessary background.
Machine Learning -- If you have taken a Machine Learning course, it is a plus. I will cover the basics ML concepts that are necessary to understand
the techniques we cover.
Texts
We will mostly follow material from the two books below. Occasionally we will also read papers!
[JM] Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, D. Jurafsky & James H. Martin, Prentice Hall, Second Edition, 2009.
[MS]Foundations of Statistical Natural Language Processing, C.D. Manning & H. Schuetze, Cambridge: MIT Press, 1999