The focus for the last few weeks of the semester will be to propose and complete a final project of your choosing. You will collaborate in groups of 2 or 3 individuals, preferably within the same lab section (across lab is okay as long as all group members are able to attend one of the labs all together).

Deliverables

The deliverables are as follows:

  • A per-individual pre-proposal (short post on Ed with 1 or more project ideas) is due October 26. You will use these ideas to identify shared interests for group projects. Form groups by October 29 and notify your professors of your group members.

  • A per-group proposal (~2-3 pages), including a timeline and full details, is due November 4. You should have a draft to discuss with your instructor for the November 2 lab meeting.(10%)

  • Weekly checkpoint demonstrations in lab (to your lab instructor and/or classmates). (10%)

  • A conference-style short paper (~4-6 pages) due December 8 (with 3-day grace period). (50%)

  • A final presentation during the Final Exam time slot December 15, 9am-Noon. (30%)

  • All project related material (e.g., code and data) so that we can verify/replicate all experiments.

Project Ideas

Your project must be related to the field of machine learning and it should go beyond what we have covered in the course in terms of assignments or core lecture materials. More detailed topics, data/paper resources, and advice is available here. On the Ed post here, post your project proposal(s). You can post up to three ideas, but each student must post at least 1.

You have two audiences: the instructors (who will review projects for germaneness and feasibility) and your classmates who you are recruiting to work with you. Your project idea should include a topic/title and references to provide context for your idea. This can include links to datasets, papers, demonstrations, etc. It only needs to be a few sentences, but it should be clear you have thought through many of the high-level steps. Also include open questions you still need to figure out to make the project work. Include the lab you are registered for (and if you are willing to attend a different lab session if you have a potential partner in that lab).

After October 26, reach out to individuals who’s ideas you share interest in. We are relying on you to form groups based on interest. We will intervene where we can to help push this forward, but the earlier you get started the better! You should start forming groups immediately and notify the professors by October 29.

When formulating your project and proposal (below), think about how you can implement phased development - if your project is all or nothing, you will be left with very few backup options in the last weeks if you discover the problem was more difficult than anticipated. Think of incremental subgoals (e.g., a proof of concept on a smaller problem) that will get you to the final goal.

Proposal

10% of your grade

By midnight on November 4, you will submit a full proposal. Your full proposal, which should all be placed in the planning/ directory, includes:

  • A 2-3 page description of your project (see below for details on expected content). This should be submitted as a PDF proposal.pdf

  • Edit planning/README.md to include a detailed timeline of goals each week.

A draft will be useful for the November 2 lab meeting so that we can help steer your proposal in a productive direction.

In addition to the timeline in the README.md file, your full proposal will include the following :

  • The title and group member names

  • Central hypothesis: What is the main question you would like to answer (i.e., your goals)?

  • Problem Description: What is the problem the project will seek to address? What would a solution look like? Who would stand to benefit from a solution to this problem? How will you know if you’ve solved it?

  • Algorithms: What is/are the central algorithm(s) for your project? Do you plan to implement them or to use libraries? If so, which ones?

  • Data: What data are you using, and where are you getting it? Are you creating a novel data set for a real-world problem, or are you using standard repository data sets? Be specific in either case (e.g. the URL for a source to download the data, etc.).

  • Experiments: What experiments and what type of analysis do you plan to execute? What do you expect the results to look like? Be sure you have plans for experimental design, hyper-parameter tuning, statistical validation, etc. as appropriate to your project.

  • Impacts: What do plan to explore in terms of the potential societal impacts of this work? In your final paper, you will describe who stands to benefit from this work. Are there potential negative impacts on populations or other ethical/social concerns with respect to this work. You do not need to flesh out all of the details for the proposal, but instead provide ideas for what topics you will explore for your final paper.

  • References: provide references for the work cited, sites you got ideas or data from, etc. in case we want to drill into some of the details.

In the planning/README.md, the goal is to specify your phased development plan:

  • Include a set of intermediary goals, each of which should give you something specific to talk about (the idea here is that however many goals you achieve, you should be able to use what you’ve got to complete your paper/presentation). In otherwords, structure your project as an anytime algorithm.

  • Your development plan should also include a week-by-week timeline. What are your concrete progress goals? Be very specific and realistic - a couple of sentences for each week (November 9, 16, 23, 30, December 7). In particular, make sure you plan to finish your coding and experiments by November 30 so you have time to work on your paper and presentation. Each week as part of your checkpoint submission, you will update us on which goals you have completed, and any updates to the remaining plan you are making as a result.

You are responsible for having a complete proposal by this deadline. You should meet with your professors e.g., during office hours to help refine your plan.

Checkpoint

10% of your grade

Each week in lab, you will provide an update to your instructor on your progress (via your git repo checkpoints/README.md file). Your grade will be determined by your ability to:

  • Make sufficient progress each week and/or be proactive about seeking assistance in the case of major roadblocks.

  • Sufficiently document updated accomplishments and goals (via your checkpoints/README.md)

  • Demonstrate your progress (e.g., through code review or analysis of graphs)

  • Present a mid-project review to the lab on November 23. This short presentation will also server as practice for your final presentation. You will be expected to a) motivate/introduce the goals of your project, b) overview your approach, and c) provide preliminary results. Each group will have about 10 minutes.

Paper

50% of your grade

Your 4-6 page final paper is due in the paper/ directory of your Git repository by midnight on December 8. We are providing a 3-day grace period to provide flexibility (i.e. so long as you turn in your paper by the end of the day on December 11th, you will not be penalized for a late submission).

All relevant figures and tex files should be present in your directory. If you use an online editor (e.g., Overleaf), recompile your final code on our systems to ensure compatibility. You have been provided a sample report document in your lab directories. You must utilize the provided LaTeX style files to produce your report. There are many resources online for learning LaTeX; see Prof. Mitchell’s page of references, or Prof. Newhall’s page which tells you where to get some example tex files on our systems. We are more than happy to answer questions, but Google, Ed and Stack Overflow are where we look for answers so you should try them first. Details about the paper requirements are available here. Note that your grade is not based on how novel your results are, but rather in your ability to convey your understanding of the problem and analyze your results. The final grade is based on (a) the design and execution of the experiment as well as (b) the thoroughness and readability of the paper.

Here are some sample papers from CS68 Bioinformatics Project (each of these was machine learning related but there is more of a focus on the applications than some of your projects):

Presentation

30% of your grade

Your group will present during the exam period on December 15. Each group will have 12-15 minutes to present plus 2-3 minutes for questions (note that if your presentation is outside the 12-15 minute range, you will loose points for timing). The grade for this portion is completely based on your delivery, not the difficulty of the project or the impressiveness of results. Having a great project but failing to communicate your design, results, and analysis will result in a poor grade. Please work on your slides throughout the project, and practice with another group present at least once. All members of the group are expected to present equally. Please follow Prof. Newhall’s Presentation Guidelines for tips. A few general comments:

  • Your presentation should use figures and diagrams wherever possible. In particular, you will probably have to make new figures in addition to what you plan on putting in your paper. A visual aid is always better than words on a slide.

  • Slides should not be cluttered; provide concise outline of main points, not a transcript of what you are going to say. You don’t want the audience reading your slide, you want them paying attention to you. When in doubt, use figures and illustrations.

  • Practice. Practice. Practice. The easiest way to handle nerves is to be comfortable with what you plan to say, and to have given a talk to an audience beforehand.

  • The presentations will take place on Zoom; you will need to decide who will screen share, but all the presenters should be involved (i.e. all group members should talk a reasonably balanced amount during the presentation).

Submitting your Project

You should hand in:

  • In the subdirectory paper, place all files required to compile this document, including your images, bib file, and source tex files. We must be able to compile the document from source.

  • In the subdirectory presentation, place a PDF of your presentation slides and any other visual aides you used in your final presentation.

  • In the code directory, all of scripts and main programs you developed for the project. Include a README describing the purpose of each file and how to use them to produce the results. Your code should be well designed and commented. If you used downloaded software, state as much in the README file.

  • Submit data used for evaluation in the repository if it is small in size. If it is large, please contact us ahead of the deadline to make arrangements. Place a README file as well, describing the source of the data (paper, website) and any pre-processing steps you utilized (e.g., throwing out incomplete data). The idea is to make it easy for us or someone else to re-use your data and replicate your results.