Many students have asked for a "study guide". Below I lay out the
topics that you should be familiar with for the first exam. There will
be no programming questions on the exam. All of the questions will be
conceptual questions, largely about the lecture material but also
about the topics covered in the labs.
I will not provide sample questions for you to answer. You can expect
that there will be short answer questions and extended questions
requiring one or two paragraphs. There will be some questions that
require you to be knowledgeable about the mathematical formulas that
we've used to describe various NLP algorithms.
This list of topics is a superset of the topics that will be covered
on the exam. Since the exam takes place in a 90 minute class,
everything cannot be covered.
Exam 1 Topics
There's really nothing special about this list, it's just a slightly
more extended version of the syllabus.
- Regular expressions
- Types, tokens, Zipf's law, precision, recall, f-measure
- Basic Probability: Chain rule, Bayes' rule, Markov assumption, Maximum Likelihood Estimation
- Language modeling: n-gram models
- Smoothing: Laplace, discounting, interpolation, backoff, stupid backoff, Good-Turing, Kneser-Ney
- Data sets: training, test, development
- Part of speech tagging: n-gram POS taggers, markov taggers, transformation-based learning
- Morphology: segmentation, trie data structure, successor/predecessor frequency/entropy, orthographic similarity (Levenshtein distance)
- Hidden Markov models: formulation, computing likelihood (forward algorithm), decoding (viterbi algorithm), training (forward-backward algorithm)