(A work in progress…)
Everything from Exam 1, plus…
Ethics
- in-class discussion; see your own notes/readings…
Text Classification
- text categorization
- sentiment analysis
- decision list classifier
- naive Bayes classifier
- bag-of-words
- Bayesian inference
- prior probability
- likelihood
- naive Bayes assumption
- linear classifiers
- unknown words
- stop words
- sentiment lexicons
- feature selection
Evaluation
- gold labels (aka ground truth)
- accuracy
- precision, recall, f-measure
- confidence
POS Tagging
- part of speech (POS)
- closed class vs open class words
- lots of specific parts of speech
- Penn Treebank tag set
- disambiguation
- sequence model
- Markov chain
- Markov assumption
- Hidden Markov Model
- decoding
- Viterbi algorithm
Grammars
- Context-Free Grammar
- lexicon
- terminal symbols
- non-terminal symbols
- parse tree
- start symbol
- (un)grammatical
- treebanks
- Chomsky normal form
- unit productions
Parsing
- CKY parsing
- parsing vs. recognizing
- PCFG
- probabilistic CKY
Machine Translation
- fertility
- permutation
- spurious words
- word alignments
- fractional counts
- alignment probabilities
- distortion
- EM algorithm
Other Concepts From Labs
- word sense
- polysemous (vs monosemous)
- most frequent sense
- feature vector
- collocation
- sparse matrix