CS91R: Final Project
-
April 23 (by class): Project Proposal (
proposal.md
) -
May 1 (by lab): Work in Progress
-
May 8 (by midnight): Final Write-up and Code (
writeup.md
)
Goals
The goals for this assignment are:
-
Build a text-based user interface
-
Employ techniques for natural language processing
-
Write about software
-
Compute with text
Computing with Text: Final Project
The final project (the last two labs) is our chance to apply what we have learned this semester about computing with text to a problem that is interesting to you!
Your project should have elements of text-based user interfaces and natural language processing, but the exact mix is up to you. You can also expand upon any of the labs we did this semester. The scope of the project should be at least a two-week lab, but feel free to go beyond.
In addition to your code and documentation, you will also submit a write-up describing your project as a markdown document. You should address the what, why, how and who of the project. You can work alone or with a partner. You must cite any resources you used to develop your project.
Assessment
Your project will be judged on the following criteria:
-
Proposal (4): Description of what and why.
-
WIP (4): Work-in-Progress demonstration.
-
Ambition (4): Ambition of project idea and implementation.
-
Creativity (4): Creativity of the project idea and implementation.
-
Execution (8): How much of your vision was accomplished successfully.
-
Style (4): Coding design, implementation, and documentation.
-
Write-up (8): Completeness & clarity.
Project Types (ideas, not exhaustive)
-
Natural Language Processing
-
information retrieval
-
predictive text
-
authorship detection
-
text summarization
-
sentiment analysis
-
question answering
-
topic modeling
-
chat bots
-
clustering
-
visualization
-
-
Text-Based User Interface
-
build your own CLI tool with a flushed out interface and documentation
-
build your own editor using ncurses
-
build a dataset viewer (think
vd
, but not that advanced) -
build your own shell
-
text-based game
-
process the ascinema
cast
files
-
Some Interesting Data Sources (ideas, not exhaustive)
-
-
subsets can be found in
/data/cs91r-s25/misc/movies.tsv
and/data/cs91r-s25/misc/episodes.tsv
.
-
-
a version can be found here:
/data/cs91r-s25/misc/wiki
and a programquery.py
can be used to navigate it.
-
-
use datasets we’ve downloaded (
/data/cs91r-s25/ngrams
)
-
Foreign Relations from State Department
-
dataset presented by Joe Wicentowski
-
``KG-FRUS: a Novel Graph-based Dataset of 127 Years of US Diplomatic Relations''
-
-
Sentiment Evaluation XML (
/data/cs91r-s25/semeval-19-04/
) -
Project Gutenberg (
/data/cs91r-s25/gutenberg
) -
Brown Corpus (
/data/cs91r-s25/brown
) -
Enron Emails (
/data/cs91r-s25/corpora/enron/
) -
Names (
/data/cs941-s25/misc/lastnames.csv
&/data/cs941-s25/misc/babyames.csv
) -
-
There are other sites too
-