Professor: Richard Wicentowski
Office: Science Center 290 (in the Chemistry hallway)
Phone: (610) 690-5643
Office hours: Tuesday 1:00-4:00 pm; Monday, Wednesday
and Friday by appointment
Room: Science Center 199
Class Time: Tuesday, Thursday 9:55am–11:10am
Lab Room: Science Center 256
Lab Times: Thursday 1:05pm–2:35pm; Thursday 2:45pm–4:15pm
Textbooks:
WEEK | DAY | ANNOUNCEMENTS | TOPIC & READING | LABS & PROJECTS |
1 | Sep 02 | Introduction, Basic Probability, Regular Expressions Choose one of the next two: • M&S Ch.2 (p 40-48 probability background), Ch.3 (linguistics background: skim as needed) • J&M Ch.1 (history of NLP) and Section 2.1 (regular expressions; see also Mertz below) Both of these: • BK&L Ch.1 (intro to NLTK; Python refresher) • Mertz, D., 2003. Text Processing in Python, Ch.3 (reference for regular expressions) | Lab 1 | |
Sep 04 | ||||
2 | Sep 09 | Words and N-Grams Choose one of the next two: • J&M §3.9, Ch.4 • M&S Ch.6 | Lab 2 | |
Sep 11 | Drop/Add ends (Sep 12) | |||
3 | Sep 16 | Part of Speech Tagging Pick on the following two: • J&M §5.1-5.4, §5.6-§5.7 • M&S §10.0-10.1, §10.2.1, §10.4 Each of these: • NLTK chapter 5 (especially §5.3-§8) • NLTK Reference on tag.brill • Brants, T. TnT: A Statistical Part-of-Speech Tagger (skip §2.5) | Lab 3 | |
Sep 18 | ||||
4 | Sep 23 | Morphological Analysis • J&M § 4.10 (entropy) • J&M § 11.5 • Harris (1955, 1967), Hafer and Weiss (1974) | Labs 4/5 | |
Sep 25 | ||||
5 | Sep 30 | |||
Oct 02 | Sequence tagging using Hidden Markov Models Pick one of the following two: • J&M §2.2 (FSA); §3.4 (FST), §5.5 (HMM POS Tagging), §6.1-6.5 (HMMs) • M&S Chapter 9 (Note that the variable names used for some parts of the description are different; e.g. M&S uses X for the states; J&M uses Q like we did in class. Other than variable name differences, the material is the same.) The ice cream HMM in Excel: • Jason Eisner's Excel spreadsheet demonstrating the forward-backward training algorithm. | |||
6 | Oct 07 | |||
Oct 09 | ||||
Oct 14 | Fall Break | |||
Oct 16 | ||||
7 | Oct 21 | |||
Oct 23 | Lexical Semantics Lexical Semantics • J&M §19.1-19.3, §20.1-20.6 • Naïve Bayes: MR&S Ch. 13 up to 13.4 (skim 13.3, skip 13.4.1) | Lab 6 | ||
8 | Oct 28 | Guest lecture: Gideon Mann, Head of Data Science, Bloomberg LP | ||
Oct 30 | Lexical Semantics (continued) • McCarthy, D. et al, 2004. Finding Predominant Word Senses in Untagged Text | Lab 7 | ||
9 | Nov 04 | Office hours Wednesday 2:30-5:30 this week only (Nov 05) | Sentiment Classification • Pang, B. et al. 2002. Thumbs up? sentiment classification using machine learning techniques. • O'Connor, B. et al, 2010. From tweets to polls: Linking text sentiment to public opinion time series • Riloff, E., Wiebe, J., and Wilson, T. 2003. Learning subjective nouns using extraction pattern bootstrapping. | |
Nov 06 | ||||
10 | Nov 11 | Parsing • CFGs and Treebanks: J&M §12.1& • Top-down, Bottom-up and CKY Parsing: J&M §13.1& • PCFGs and Statistical Parsing: J&M §14.1& | ||
Nov 13 | Lab 8 | |||
11 | Nov 18 | Clustering • MR&S Chapter 16: §16.3-§16.4 • MR&S Chapter 17: §17.1-§17.4, §17.7 • MR&S Chapter 14: §14.3 | ||
Nov 20 | Final Project | |||
12 | Nov 25 | Machine Translation • J&M Ch. 25 • § 1-2 only: Knight, K. Automating Knowledge Acquisition for Machine Translation. AI Magazine, Volume 18, No. 4, 1997. • Knight, K., 1999. A Statistical MT Tutorial Workbook. Prepared for the 1999 JHU Summer Workshop. | ||
Nov 27 | Thanksgiving | |||
13 | Dec 02 | Machine Translation (continued) | ||
Dec 04 | ||||
14 | Dec 08 | |||
Dec 09 | Speech recognition and generation Donuts and our very own Adam Lammert! | |||
Dec 18 | Final project due at NOON |