Due: by 1:15pm, Thursday Nov. 9

This part counts towards 5% of your total grade.

I will not accept any proposals late; a late submission results in a zero for this part of the project. However, if you don’t submit on time, you should still submit your proposal and annotated bibliography to get my feedback on your project and its plan for carrying it out. Both the proposal document and the annotated bibliography will be useful in your final project report.

To submit: push your .tex and .bib files to your Project repo, and turn in a hardcopy of your proposal and annotated bibliography to me. Staplers are available by the printers in the lab, or in the CS main office.

Overview

The goal of the course project is to give you a taste of what it is like to do research. The Proposal part of your project involves:

  • first finding and refining a project topic organized around a general problem to solve, this will require some examination of related work (resulting in an annotated bibliography).

  • second, you will develop a plan for your for implementing your project, including weekly milestones along the way. Your plan should include both implementation and testing/experiments along the way.

If you have not already done so, first read the page about the course project that gives a high-level overview of the project and lists the specific parts with tentative due dates for later parts: CS87 Fall 2023 Course Project

Project Repo

I will give you a starting point repo for your course project. In this repo you will add all the parts of your project. It includes a subdirectory with latex and bibtex starting point files for your project proposal and annotated bibliography.

As soon as you have your project group, email me your project group members' names, and I will created a Project repo for your group.

The contents of the Project repo are:

FinalReport/  MidwayReport/  ProjectProposal/  README.adoc  source/

Look at the README.adoc file for information about the repo contents.

ProjectProposal subdir

You will write your proposal and annotated bibliography in the ProjectProposal subdirectory that contains the following:

Makefile  README.adoc  proposal.bib  proposal.tex

Run make in this directory builds a .pdf of a single document from proposal.tex that includes your proposal and annotated bibliography.

The proposal.bib file is a start of a bibtex file of references that are used to build the References section in your proposal document, and should be cited in your annotated bibliography part of proposal.tex (and perhaps in parts of your proposal too.

See the examples given in the starting point and follow them to add citations and bibtex entries for work related to your project.

Make sure to build the .pdf of your proposal with annotated bibiligraphy document on the CS system (it needs to compile with latex installed on our system as this is how I will build it). If you are using Overleaf to edit, you need to explicitly check that your latex and bibtex files build on our system. Overleaf often still creates a .pdf if there are latex errors, and our system may not have the same version of latex or style files installed.

Look over the .pdf to make sure it is correct before submitting it.

Then add the .tex and .bib files to your repo. And make sure you DO NOT add compiled files to your repo. You should just add the .tex and .bibtex files and not .pdf .log .aux etc. files that are created from running make (which calls pdflatex and bibtex to create the .pdf file).

Always run make clean before doing a git add, as this will help you avoid adding files that are built to your repo. Also, NEVER DO git add *. Instead, add individual files to be committed like this:

make clean
git add propocal.tex
git add proposal.bib
git add *.tex          # add all files with .tex prefix
...

Look at the README.adoc for more information about this subdirectory.

Annotated Bibliography

You should create an annotated bibliography of at least 3 papers closely related to your project (more is fine).

You must use my latex template for your annotated bibliography. It should be added as a section at the end of the proposal.tex document in the ProjectProposal subdirectory of your Project repo.

References should be added to the proposal.bib file in bibtex format.

As you read each paper, add an entry to your annotated bibliography. An annotated bibliography will list each paper as it would appear in a references section of a research paper. In addition, each listing is annotated with a couple paragraphs that describes the work. Use the course reaction notes as a guide for writing the annotation part: each paper should have 1 paragraph that summarizes the work, lists strengths and weaknesses, and discusses the main contribution of the work, and it should have 1 paragraph that analyzes the work in the context of how it is related to your project (think about asking and answering some questions about this work related to the project you are addressing).

For your own purposes, I encourage you to add a second part that lists additional related papers/work in your bibliography and to include a very brief summary of them (you do not need to include the full required annotation of the 3 most related). I will not grade these additional entries, but doing so will be helpful as you refine your project over the course of the semster, it will come in handy when you are writting your final project report, and it may help me point out some other related work to your project.

Look at the annotated bibliography section of my CS Research and Writing Guide the "Getting Started" section below for links to places to look for related work papers.

Written Proposal

Your project proposal should be 4-5 pages long (in addition to your annotated bibiliography that may be 2-4 pages long). It should clearly state the problem you are solving, how you are solving it, and explicitly list your plan for completing your proposed project.

You must use my latex template for your written report, it is in the ProjectProposal subdirectory of your Project repo.

Specifically, your proposal should contain the following:

  1. Title and Authors

  2. An Introduction: 1-2 paragraph summary of the problem you are solving, why it is interesting, how you are solving it, and what conclusions you expect to draw from your work.

  3. Related Work: 1-2 paragraphs describing similar approaches to the one you propose. This need not be an exhaustive summary of related literature, but should be used to put your solution in context and/or to support your solution. This is also a good way to motivate your work. This can be a summary taken from your longer annotated bibliography.

  4. Your Solution: 3-4 paragraphs describing what you plan to do, how you plan to do it, how it solves the problem, and what types of conclusions you expect to draw from your work.

  5. Experiments: 1-3 paragraphs describing how you plan to evaluate your work. List the experiments you will perform. For each experiment, explain how you will perform it and what the results will show (explain why you are performing a particular test).

  6. Equipment Needed: 1 paragraph listing any software tools that you will need to implement and/or test your work. If you need to have software installed to implement your project, you should check with Jeff Knerr to see if it is something that can be installed on the CS lab machines.

  7. Schedule: list the specific steps that you will take to complete your project, include dates and milestones. This is particularly important to help keep you on track, and to ensure that if you run into difficulties completing your entire project, you have at least implemented steps along the way. Also, this is a great way to get specific feedback from me about what you plan to do and how you plan to do it.

  8. Conclusions: 1 paragraph summary of what you are doing, why, how, and what you hope to demonstrate through your work.

You must use my latex template for your written report, it is in the ProjectProposal subdirectory of your Project repo.

Getting Started, Ideas

You are strongly encouraged to come up with your own idea for a course project. However, I have suggestions for some suggestions for project ideas that are available here (and please, these are not for public distribution):

evince ~newhall/public/cs87/project_ideas_f23.pdf

You are welcome to use one of these as a starting point for developing a project, or to come up with your own idea. Note that there are a lot of ideas here, and some are older than others. For any of them, you will need to investigate the general idea I provide, look for recent related work, and formulate a detailed plan.

Once you have one or two general ideas for projects, you will want to more completely define exactly what you plan to do and how you plan to do it. A good way to start with this step is to take a look at related work; you want to have an understanding of what has been done, and how what you want to do fits into the field. It will help you modify and more completely define your project, and get some ideas of how to implement and test your project.

Take a look at the "Getting Started" section of my CS Research and Writing Guide for links to places to look for related work papers. There are links to USENIX conference proceedings and to ACM and IEEE digital libraries. I have links to some other (older) papers here: Additional Cluster and Distributed Computing Papers

Another place to start getting ideas is to look at recent conference proceedings from SC, IPDPS, and other parallel and distributed computing conferences:

  • SC conferences (look for Technical Session to find papers):

  • IPDPS conferences

  • Other ACM, IEEE, and USENIX sponsered conferences. Here are a few suggestions:

    • PODC (Principles of Distributed Computing) www.podc.org

    • HPDC www.hpdc.org,

    • ICDCS, PPoPP, IEEE Cluster, …​

    • USENIX conference proceedings USENIX has a lot of top systems conferences. Some that may be more applicable to course projects include: OSDI, IPTPS, NSDI, HotPar, HotCloud, HotStorage, USENIX ATC, FAST. Papers from their conference proceedings are freely accessable on-line.

  • All ACM conference and journal publications: www.acm.org. Looking as specfic SIG pages may be useful, for example SIGOPS sponsored conferences ( sigops confs). Search for parallel and distributed conferences under the "Conferences" menu, or search for toics in their digital library.

  • All IEEE conference and journal publications are available off their main website: www.computer.org. Search for parallel and distributed conferences under the "Conferences" menu, or search for toics in their digital library.

  • All USENIX conferences at www.usenix.org, and USENIX proceedings links to all USENEX conferences and papers available, free, on-line.

Machines and Software Available:

About platforms and languages for your project

Your project should compile and run on our system. You may use other systems too, like ACCESS and Strelka, but it also should work on our system. If you use Amazon AWS, then Spark code parts of your project may not run on our system. Other than this, the expectation is that you can compile, run and demo your project to me on the CS lab machines.

If you need software installed on our system, contact Jeff Knerr.

You are welcome to use any language you’d like to implement your project. However, I strongly recommend that you use C or C++ to implement the bulk of it. Python may seem like an atractive language option, and it is great for writing scripts, however C and C++ are the languages for parallel and distributed computing (think MPI, CUDA, OpenMP, pthreads, …​). Also, python doesn’t have support for parallel execution of its threads due to its global interpreter lock. Using C and C++ to implement the bulk of your project will make things easier for you in the long run. If you don’t beleive me, here is a quote about this from a student who took CS87 in 2018:

"We used python in our CS87 project, and we definently should have used C or C++. Python often has libraries that wrap around the parallelization methods you have been using, but they are usually not super well maintained. For instance, mpi4py works great until you try to spawn some relatively low number of processes (I think around 1000 or so), which would be allowed in native MPI. We couldn’t figure out why, and the library isn’t actively maintained, so we spent a lot of time working around this issue that could have been used elsewhere. Also, we got to a point where we wanted to combine MPI and multithreading for shared memory on node machines. Python doesn’t support this feature in an efficient way like C or C++ would, and this limited the scope of our ideas. I highly recommend doing your final project in C or C++."