Use Teammaker to form your team. You can log in to this site to indicate your partner preference. Once you and your partner have specified each other, a GitHub repository will be created for your team.
The goal of the project is to give you several weeks to explore an AI topic of your choice in more depth. In the next section are some suggestions, but feel free to consider other ideas (just be sure to discuss them with me first).
Please keep in mind the amount of time you have to do this project; you should plan to do about the same amount of work on the project each week that you would on a normal lab. In other words, don't pick a project you can finish in two days, but don't pick one that would take two months either.
Here are some potential project ideas:
Or come up with your own idea (again, be sure to run it by me before you get started)! All the algorithms we've used this semester have a multitude of variants you can explore, and many of them can be combined with each other in interesting ways (e.g. combine a CNN with RL to play an Atari game, use a GA to evolve a board evaluation heuristic for MiniMax, etc.).
You are encouraged to make use of existing libraries (e.g. keras, scikit-learn, etc.), as well as other resources you may find on the web. However, keep in mind the standard ethics policy: outside resources are fine, so long as 1) you use proper attribution, 2) it's clear what work you personally did, and 3) you're not trivializing the assignment (i.e. no taking shortcuts to avoid learning).
In recent years, a large amount of work in machine learning has been motivated by various contests and challenges. One of the earliest and best known was the Netflix prize (official site, Wikipedia), which offered $1M to the team that could improve the site's recommendation system by 10%. The Netflix prize was claimed in 2009; since then machine learning contests have become commonplace.
Find a machine learning challenge of your choice from kaggle. Some of these contests are currently active, with prizes available. Others are inactive, but are still interesting challenges to attempt for a project.
Kaggle competitions vary widely in what sort of data and instructions are provided. You should therefore think carefully about the competition you choose: not just "is it a cool problem?" but also "how hard will this data be to work with?" and "how clearly are the expectations of the competition defined?". Please check with me and describe your plan of attack before you get too involved in a particular contest.
In order to download data, you will need to sign up for a free account. Kaggle also has a discussion forum, which may have useful suggestions, especially if you are working on an active contest.
Some of the project ideas will likely involve large data sets that could quickly blow through your disk quota. To avoid this, you can save them to /scratch (instructions), which is unlimited, but isn't backed up.
As a general rule, /scratch is a good place for things that are large, but can be re-created if they're lost (e.g. data files, program output (if it's big), etc.). You should still keep your source code in your homedirectory (and in GIT). Definitely don't add giant data files to your GIT repo, though.
Also, take a look at the department's suggestions for long running jobs. As that page suggests, the screen program is very helpful, but remember that your screen sessions will last until you manually end them, so try to avoid leaving dozens of abandoned instances on screen on a server.
By the end of the first week, you need to have turned in a project checkpoint writeup, which will be a brief description of your project. We will then take time during lab that day (11/20) for each group to briefly describe their project idea to the class.
In the LaTex file, project.tex, you will describe your project. This file already contains a basic structure that you should follow. Feel free to change the section headings, or insert additional sections. Recall that you use pdflatex to convert the LaTex into a pdf file.
Please note that Swarthmore has a number of resources to help with writing and presentations; the WA/SPA programs are there for you, and I encourage you to make use of them for your final project.
In addition to the written report, you'll also be presenting your project to the class. We'll do this using Zoom, just like we've done all semester.
The presentations will take place during our final exam slot, which is December 11th 9:00AM-Noon (EST). Each group will have a 10 minute presentation slot, plus several minutes for questions from the audience; presentations will be timed, so you'll want to practice to make sure you've got the timing down. Here's a copy of the rubric I'll be using: presentation rubric.
Before the deadlines, you need to submit the following things through git:
As your project develops and you create more files, be sure to use git to add, commit, and push them. Run: git status to check that all of the necessary files are being tracked in your git repo.
Don't forget to update the README so that I know how to test your code.