Please be aware that many elements on this page will change throughout the semester, including the course schedule. It is your responsibility to review this page periodically for updates.
We value any and all student feedback. If you would like to provide anonymous course feedback, use this submission form here. Please be constructive in any comments so that we can adjust the course as best possible.
Quick Links
-
Zoom for Lecture and Lab
-
EdStem for course discussion/questions
-
Slack for Lab and office hours
-
Guide to Remote Tools that you’ll need for this course
-
Lecture Notes Folder
-
Panopto for Lecture and Lab Videos
-
Gradescope for Homeworks and Exams
Course Basics
Instructors: Ben Mitchell, Ameet Soni
Lecture: MWF 10:40am-11:30am, hosted on Zoom.
Lab:
Lab Section A |
Soni |
Monday 9:00am-10:30am |
start on zoom, move to slack |
Lab Section B |
Mitchell |
Monday 2:00pm-3:30pm |
start on zoom, move to slack |
Lab Section C |
Mitchell |
Monday 3:45pm-5:15pm |
start on zoom, move to slack |
Office hours: in the CS66 Slack workspace (in the #office-hours
channel).
All times are Eastern time zone. These are tentative; please do reach out
to us if you cannot make these times and we can make arrangements.
Soni |
Wednesday |
4:00pm-5:00pm |
Mitchell |
Thursday |
2:00pm-4:00pm |
Soni |
Friday |
3:00pm-4:00pm |
Waitlist Procedure
For Spring 2021, please use the following waitlist procedure to stay active in the course and to be considered for open positions:
-
If you haven’t already, be sure to fill out the Department waitlist request form.
-
Stay current with the course lectures and labs by watching all lecture and lab videos for week 1 on Panopto by the end of the day that the video is posted.
-
You will receive an email asking basic concept questions for each lecture. Answer and submit the Google Form and we will consider you as being active in the class and eligible for any seats that open up.
-
Please do not attend lab or lecture on Zoom. There are too many students on the waitlist to manage all current enrollees plus the waitlist.
-
The Instructors will stay in contact to provide updates. There are no guarantees that following the waitlist procedure will result in obtaining a seat; we will replace students that drop based on the department lottery priorities and with students who follow the above steps and who can fit the opened lab spot into their schedule.
Required Course Textbook
We will utilize A Course in Machine Learning by Hal Daume III. Online, free ebook. We will also include additional reading material on the course schedule.
Additional References
These are all excellent books that we have read. However, they are geared more towards graduate students and researchers, so we did not choose them for our course textbook. If you are looking to get deeper into the material, here are some suggestions:
-
Machine Learning by Tom Mitchell. This is the gold standard; however, it is too expensive to be the required textbook. You may be able to find used versions for a reasonable price; there is also a reserved copy in the Cornell Library.
-
Introduction to Machine Learning by Ethem Alpaydin. If you are looking to purchase a hard copy, this is the one I recommend as it reasonably priced and a good textbook. It is also available as an ebook available for free through the library’s ProQuest account.
-
Pattern Recognition and Machine Learning by Charles Bishop (Ameet’s favorite book)
-
Machine Learning: A Probabilistic Perspective by Kevin Murphy
-
Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
-
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (available for free online at: https://www.deeplearningbook.org/)
Course Description
Welcome to CPSC 66. Machine learning is the study of algorithms that learn through experience. This course will introduce you to various frameworks (e.g., supervised learning) and associated algorithms for these frameworks (e.g., support vector machines). The major aim of this course, however, is to develop an understanding of the entire machine learning pipeline rather than focus on the algorithm du jour. We will also spend a significant amount of time inspecting core concepts (e.g., generalization) from statistical and theoretical perspectives. With each topic, we will consider both the practical and open research questions at the heart of the field. You will be expected to implement solutions through lab assignments, but also digest and discuss readings that build off of lecture topics.
To enroll in this course you must have completed CPSC 35. There is no other requirements, though linear algebra and familiarity with probability will be useful. The course will also cover a good deal of probability theory, but much of this can be picked up with provided reading. This course is designated as a natural sciences and engineering practicum (NSEP) and qualifies as a Group 3: Applications course for the CS major/minor requirements.
Course Learning Goals
By the end of the course, you will understand:
-
several machine learning frameworks, including supervised learning, unsupervised learning, and hybrid approaches
-
various algorithms for the frameworks we explore, including the variation in data representation
-
how to choose and apply an appropriate framework and algorithm for a new problem
-
practical considerations for data, including data preprocessing, feature engineering, and resource constraints
-
the core concept of generalization, and the associated theoretical tools for inspecting both our data and models
-
theoretical and empirical evaluation of performance
Student Responsibilities
we have outlined the skills and objects this course promises to provide you. For this promises to be upheld, you will need to commit towards the policies outlined below. To succeed you should:
-
Attend class and lab The primary introduction to course material is through class lecture. Additionally, we often do learning exercises during class, which give you immediate experience with the material we are covering. Class and lab attendance is mandatory. While we are more than happy to help with any material in office hours, priority will be given to students who attend and participate in lecture. Office hours are not to make up for missed lecture.
-
Participate actively in learning process. Showing up is necessary, but not sufficient to success in the course. To fully develop your analytical skills, you are expected to participate in class discussion. This includes asking questions during lecture portions and engaging your peers during short class exercises. Studies show active involvement is the number one determinant of student success.
-
Prepare for lecture You are expected to have done pre-reading before each lecture as well as reviewing notes from the prior meeting. If you have not done so, you will be unprepared for the daily concept quizzes that begin lecture as well as subsequent group discussions.
-
Start the lab assignments early If you get in the habit of doing this, you will be much better off. As the labs get longer and more difficult, starting early will give you plenty of time to mull over the lab problems even when you aren’t actively writing your solution.
-
Practice, practice, practice The only effective way to learn the material and pass the exams is to consistently do the labs, and to practice example problems presented in class and in the book. Forming study groups to go over practice problems and to review lecture and reading notes is a great way to prepare for exams.
-
Seek help early and often Because course material builds on previous material, it is essential to your success in this class that you keep up with the course material. There are a lot of sources of help: ask questions during lecture; ask your classmates (make sure you have read the Academic Integrity section for restrictions); get help during lab sessions; and come to office hours.
Course Work
Assessment (Grading)
This is subject to change but is a rough estimate
-
30% Lab (3 to 4 total)
-
5% Homework assignments (4, lightly graded)
-
25% Final Project
-
10% Pre-reading quizzes and class participation
-
30% Exams (2 Midterms, no final)
Class Participation
As discussed in responsibilities, your participation involves:
-
Daily concept quizzes and/or class prompts
-
Required attendance to lecture and lab
-
Active participation in lecture
-
Active engagement in the class discussion group
Working With Partners
For partnered lab assignments, you should follow these guidelines:
-
The expectation is that you and your partner are working together side by side in the lab for most, if not all, of the time you work on partnered lab assignments.
-
You and your partner should work on all aspects of the project together: initial top-down design, incremental testing and debugging, and final testing and code review.
-
If you are pair programming, where one of you types and one of you watches and assists, then you should swap roles periodically, taking turns doing each part.
-
There may be short periods of time where you each go off and implement some small part independently. However, you should frequently come back together, talk through your changes, push and pull each other’s code from the git repository, and test your merged code together.
-
You should not delete or significantly alter code written by your partner when he or she is not present. If there is a problem in the code, then meet together to resolve it.
-
If there is any issue with the partnership, contact the professor.
Taking time to design a plan for your solution together and to doing incremental implementation and testing together may seem like it is a waste of time, but in the long run it will save you a lot of time by making it less likely that you have design or logic errors in your solution, and by having a partner to help track down bugs and to help come up with solutions to problems.
Partnerships where partners work mostly independently rarely work out well and rarely result in complete, correct and robust solutions. Partnerships where partners work side-by-side for all or most of the time tend to work out very well.
You and your partner are both equally responsible for initiating scheduling times when you can meet to work together, and for making time available in your schedule for working together.
Policies
Absence / Late Work
All homework and lab deadlines will have a 24 hour grace period. This means that assignments submitted less than 24 hours late will be accepted and graded normally. In addition, for lab assignments, each student may use up to a total 7 late days across the entire semester, no questions asked. This is more than a normal semester due to added difficulties of working remotely this semester. Late days do not apply to homework, quizzes, or other forms of assessment. A late day is a considered a full 24-hours (i.e., 15 minutes late is the same as 23 hours late) and late days are counted against all lab partners. The grace period does not count against late days; as an example, if an assignment is due at 1pm on Tuesday and you submit it at 9am on Wednesday (<24 hours), you do not need to use a late day, but if you submit it at 7pm on Wednesday (>24 hours, <48 hours), you would use one late day.
To use your late days, you must email your professor after you have completed the lab and pushed to your repository. You do not need to inform anyone ahead of time, and you do not need to provide a reason. When you use late days or a grace period, you should still expect to work on the newly-released lab during the following lab section meeting. The professor will always prioritize answering questions related to the current lab assignment during lab meetings and office hours. In the rare case in which only one partner has unused late days, the partnership can use the late days, barring a consistent pattern of abuse.
If you feel that you need an extension on an assignment or that you are unable to attend class for two or more meetings due to a medical condition (e.g., extended illness, concussion, hospitalization) or other emergency, you must contact the dean’s office and your instructors. Faculty will coordinate with the deans to determine and provide the appropriate accommodations. Note that for illnesses, the College’s medical excuse policy, states that you must be seen and diagnosed by the Worth Health Center if you would like them to contact your class dean with corroborating medical information.
Academic Accommodations
If you believe you need accommodations for a disability or a chronic medical condition, please contact Student Disability Services via email at studentdisabilityservices@swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the office will issue students with documented disabilities or medical conditions a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact Student Disability Services as soon as possible. For details about the accommodations process, visit the Student Disability Service Website
You are also welcome to contact us privately to discuss your academic needs. However, all disability-related accommodations must be arranged, in advance, through Student Disability Services.
To receive an accommodation for a course activity you must have an official Accommodations Letter and you need to meet with us to work out the details of your accommodation at least two weeks prior to any activity requiring accommodations.
Academic Integrity
Academic honesty is required in all your work. Under no circumstances may you hand in work done with or by someone else under your own name. Discussing ideas and approaches to problems with others on a general level is encouraged, but you should never share your solutions with anyone else nor allow others to share solutions with you. You may not examine solutions belonging to someone else, nor may you let anyone else look at or make a copy of your solutions. This includes, but is not limited to, obtaining solutions from students who previously took the course or solutions that can be found online. You may not share information about your solution in such a manner that a student could reconstruct your solution in a meaningful way (such as by dictation, providing a detailed outline, or discussing specific aspects of the solution). You may not share your solutions even after the due date of the assignment.
In your solutions, you are permitted to include material which was distributed in class, material which is found in the course textbook, and material developed by or with an assigned partner. In these cases, you should always include detailed comments indicating on which parts of the assignment you received help and what your sources were.
When working on tests, exams, or similar assessments, you are not permitted to communicate with anyone about the exam during the entire examination period (even if you have already submitted your work). You are not permitted to use any resources to complete the exam other than those explicitly permitted by course policy. (For instance, you may not look at the course website during the exam unless explicitly permitted by the instructor when the exam is distributed.)
Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook:
Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion.
This policy applies to all course work, including but not limited to code, written solutions (e.g. proofs, analyses, reports, etc.), exams, and so on. This is not meant to be an enumeration of all possible violations; students are responsible for seeking clarification if there is any doubt about the level of permissible communication.
The general ethos of this policy is that actions which shortcut the learning process are forbidden while actions which promote learning are encouraged. Studying lecture materials together, for example, provides an additional avenue for learning and is encouraged. Using a classmate’s solution, however, is prohibited because it avoids the learning process entirely. If you have any questions about what is or is not permissible, please contact your instructor.
Regret clause for Spring 2021: If you commit some act that is a violation of the integrity policy (or, if you are unsure if it violates the policy) but bring it to the attention of the course’s instructor within 48 hours, the course may impose local sanctions that may include an unsatisfactory or failing grade for work submitted, but the course will not refer the matter for further disciplinary action except in cases of repeated acts.
Class Schedule
This is a tentative schedule. It will be updated as we go. We recommend that you review Tips for reading CS texbooks to help you determine what to focus on and how to get the most out of required readings.
WEEK | DAY | ANNOUNCEMENTS | TOPIC | ASSIGNMENT |
---|---|---|---|---|
1 | Feb 10 | Introduction to Machine Learning
| ||
Feb 12 | Nearest-Neighbor Classifiers | |||
2 | Feb 15 | Decision Trees | Lab 1 - Decision Trees and KNN due 3/6 Homework 1 - Intro, KNN, DTrees due 3/3 | |
Feb 17 | Drop/add ends | |||
Feb 19 | ||||
3 | Feb 22 | Evaluation Methodology & Practical Considerations
| ||
Feb 24 | ||||
Feb 26 | ||||
4 | Mar 01 | Linear and Logistic RegressionLinaer Algebra Primer
Linear Regression:
Logistic Regression:
| ||
Mar 03 | ||||
Mar 05 | Regularization; Bias-Variance Tradeoff | |||
5 | Mar 08 | Support Vector Machines and Kernels
| ||
Mar 10 | ||||
Mar 12 | ||||
6 | Mar 15 | Ensemble Learning Methods
| ||
Mar 17 | ||||
Mar 19 | ||||
7 | Mar 22 | Midterm 1 | Catch-up and review | |
Mar 24 | Spring Break | |||
Mar 26 | ||||
8 | Mar 29 | Real-world Data and Applications
| Lab 3 - SVMs, Ensembles, and Evaluation Methodology Final Project Proposals - post pre-proprosal by 4/4 here | |
Mar 31 | ||||
Apr 02 | ||||
9 | Apr 05 | Unsupervised Learning; Dimensionality Reduction | ||
Apr 07 | ||||
Apr 09 | ||||
10 | Apr 12 | Probabilistic Models
|
Homework 3 - SVMs, Ensembles, Unsupervised Learning, Applied Topic, Naive Bayes | |
Apr 14 | ||||
Apr 16 | CR/NC and Withdraw deadline | |||
11 | Apr 19 | Special Topics
Readings:
| ||
Apr 21 | ||||
Apr 23 | ||||
12 | Apr 26 | |||
Apr 28 | Midterm 2 | |||
Apr 30 | ||||
13 | May 03 | Ethics and Course Wrapup
| ||
May 05 | ||||
May 07 | ||||
May 17 | Project Presentations 2-10pm |
Resources
Class Resources
-
Help using git for lab assignments
-
CS and Unix Help Pages and Links (make, tar, git, degugging tools, editors, programming guides, screen and tmux, …)
-
Tools for remote lab assignments, ssh, vim, tmux, vcode, …