Introduction
This course will introduce you to a broad range of topics in the area of natural language processing including language modeling, part of speech tagging, spelling correction, morphology, syntactic parsing, semantics and machine translation. If time permits, we may also cover speech recognition, natural language generation or discourse systems.
Course Goals
By the end of the course you will:
- learn the algorithms and data structures central to Natural Language Processing,
- learn how these algorithms and data structures are used to access large text corpora,
- learn how to use text corpora as the basis for training probabilistic machine learning algorithms
- build components of large NLP systems such as language models, part of speech taggers, morphological analyzers, parsers and text classifiers
- develop experiments, analyze the results, and report on the results
- learn to use LaTeX to write papers that can be submitted to a conference
- read and analyze primary literature in NLP, implementing some of the algorithms described in order to better understand the work and/or make suggestions for improvement upon the work
Class Information
Professor: Richard Wicentowski
Office: Science Center 251
Phone: (610) 690-5643
Office Hours: W 10:00am-12:00pm; Th 1:15-2:45pm; by appointment
Class time: Tuesday and Thursday 9:55am-11:10am
Class room: Science Center 181
Lab time: Monday 1:15pm-2:45pm and 3:00pm-4:30pm
Lab room: Science Center 240
Textbooks
You should not purchase any textbooks this semester.
- Jurafsky and Martin, Speech and Language Processing, 3rd edition, 2019 – draft edition online
- Bird, Klein and Loper, Natural Language Processing with Python – available online
- Manning and Schutze, Foundations of Statistical Natural Language Processing, 1999 – available in Cornell Library
Tentative schedule
WEEK | DAY | ANNOUNCEMENTS | TOPIC & READING | LABS & PROJECTS |
---|---|---|---|---|
1 | Sep 04 | Introduction
| ||
Sep 06 | ||||
2 | Sep 11 | Regular Expressions
Segmentation Vector Semantics | ||
Sep 13 | Drop/Add ends (Sep 14) | |||
3 | Sep 18 | Smoothing Edit Distance | ||
Sep 20 | ||||
4 | Sep 25 | Smoothing | ||
Sep 27 | Lexical Semantics | |||
5 | Oct 02 | |||
Oct 04 | ||||
6 | Oct 09 |
| ||
Oct 11 |
First exam | |||
Oct 16 | Fall Break | |||
Oct 18 | ||||
7 | Oct 23 | Text Classification | ||
Oct 25 | ||||
8 | Oct 30 | POS Tagging and HMMs | ||
Nov 01 | ||||
9 | Nov 06 | Roger Bock'99 (BBN) | ||
Nov 08 | ||||
10 | Nov 13 | Parsing
| ||
Nov 15 | ||||
11 | Nov 20 | |||
Nov 22 | Thanksgiving | |||
12 | Nov 27 | Machine Translation | ||
Nov 29 | ||||
13 | Dec 04 | |||
Dec 06 |
| |||
Dec 11 |
Second exam | |||
Dec 18 | Final project due at NOON |
Grading
Your overall grade in the course will be determined as follows:
45% | Labs and projects |
25% | First in-class exam |
25% | Second in-class exam |
5% | Class Participation and Attendance |
Lab Policy
This course features regular lab assignments that account for the largest component of your course grade. Lab attendance is required by all students, unless you have already completed and submitted the lab assignment for the week. Additionally, the CS labs are open 24 hours a day, 7 days a week for you to use in order to complete your lab assignments.
Lab assignments will typically be assigned during the lab sections on Monday and will generally be due by noon on Monday of the following week. You are strongly encouraged to start early!
Even if you do not fully complete an assignment, you should submit what you have done to receive partial credit.
Programming language
Assignments will presuppose knowledge of python3. You will almost certainly end up learning some Perl and bash scripting, but you are not expected to know this.
Please make sure that each program you turn in has:
- A comment at the top of the program that includes
- Program authors
- A brief description of what the program does
- Concise comments that summarize major sections of your code
- Meaningful variable and function names
- Well organized code
- White space to improve legibility
- Lines whose width is less than 80 characters wide (whenever possible)
I expect that you will be using python3 for all of your lab assignments, but if you would like to use something different, you are welcome to come talk to me about your plan.
Accessing the CS labs after hours
You can use your ID to gain access to the computer labs at nights and on the weekends. Just wave your ID over the OneCard reader next to the lab doors. When the green light goes on, just push on the door handle to get in (the door knob will not turn). If you have issues with the door locks, send an email to local-staff@cs.swarthmore.edu. If the building is locked, you can use your ID to enter the door between Martin and Cornell library. For this class, your ID will give you access to the labs in rooms SCI 238, 240, 256, and Clothier basement.
Policies
Assignment Extension Policy
You must submit your assignments electronically by pushing to your assigned git repository. You may push your assignment multiple times, and a history of previous submissions will be saved. You are encouraged to push your work regularly.
To help with cases of minor illnesses, athletic conflicts, or other short-term time limitations, all students start the course with two “late assignment days” to be used at your discretion, with no questions asked. To use your extra time, you must email your professor after you have completed the lab and pushed to your repository. You do not need to inform anyone ahead of time. When you use late time, you should still expect to work on the newly-released lab during the following lab section meeting. The professor and ninjas will always prioritize answering questions related to the current lab assignment.
Your late days will be counted at the granularity of full days and will be tracked on a per-student (NOT per-partnership) basis. That is, if you turn in an assignment five minutes after the deadline, it counts as using one day. For partnered labs, using a late day counts towards the late days for each partner. In the rare cases in which only one partner has unused late days, that partner’s late days may be used, barring a consistent pattern of abuse.
If you feel that you need an extension on an assignment or that you are unable to attend class for two or more meetings due to a medical condition (e.g., extended illness, concussion, hospitalization) or other emergency, you must contact the dean’s office and your instructors. Faculty will coordinate with the deans to determine and provide the appropriate accommodations. Note that for illnesses, the College’s medical excuse policy, states that you must be seen and diagnosed by the Worth Health Center if you would like them to contact your class dean with corroborating medical information.
Academic Integrity
Academic honesty is required in all your work. Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.
Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else’s code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.
Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook: “Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion.”
The spirit of this policy applies to all course work, including code, homework solutions (e.g., proofs, analysis, written reports), and exams. Please contact me if you have any questions about what is permissible in this course.
Academic Integrity: Exam Policy
Students must strictly adhere to the following policy, which applies to all exams taken in a Computer Science course at Swarthmore.
Exam takers must place all non-essential items at the front of the room (or other designated area). Unless otherwise permitted, students may not have any electronic devices or course materials in their possession during the entirety of the exam. This includes cell phones, tablets, laptops, smart watches, course notes, articles and books, among others. These items should be placed at the front of the room near the proctor. If you need to leave the room during the exam, you must obtain permission from an instructor first. Any non-permitted discussion or aide in regards to exam material will result in immediate forfeiture of the exam and a report to the College Judiciary Committee. Please discuss any concerns or accommodations with your instructor prior to starting the exam.
Academic Accommodations
If you believe you need accommodations for a disability or a chronic medical condition, please contact Student Disability Services (Parrish 113W, 123W) via e-mail at studentdisabilityservices@swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the office will issue students with documented disabilities or medical conditions a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact Student Disability Services as soon as possible. For details about the accommodations process, visit the Student Disability Services website. You are also welcome to contact me [the faculty member] privately to discuss your academic needs. However, all disability-related accommodations must be arranged, in advance, through Student Disability Services.
You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through the Office Of Student Disability Services.