Ooops. I gave you a bad HashTableDictionary.inl file. If you ran update35 in lab on Friday, please remove this file using rm HashTableDictionary.inl and run update35 again to get a good copy
You encouraged to work with a partner on this project. The Full lab is described on a separate page. This page describes some initial steps to prepare you for the final lab. You do not need to submit this lab. Submit only the solution to the Full lab.
The Oracles of Bacon and Baseball
The
Oracle of Bacon searches a movie database to link two Actors/Actresses by a path of movies and co-stars that appeared in those movies. For example, if I want to link Kevin Bacon to Marilyn Monroe, the oracle of Bacon would report:
The Oracle says: Marilyn Monroe has a Kevin Bacon number of 2.
Marilyn Monroe was in Misfits, The (1961) with Eli Wallach
Eli Wallach was in Mystic River (2003) with Kevin Bacon
Try playing this game yourself. Then check the Oracle. The Oracle usually finds a shorter answer. The number of movies you need to report is the link number. Can you find two movie stars with a link number greater than 3? Greater than 5? It is harder than it sounds. A similar concept is the Oracle of Baseball which links baseball players through common teams. For example, linking Babe Ruth to David Ortiz reports:
Babe Ruth played with Ben Chapman for the 1930 New York Yankees
Ben Chapman played with Early Wynn for the 1941 Washington Senators
Early Wynn played with Tommy John for the 1963 Cleveland Indians
Tommy John played with Roberto Kelly for the 1988 New York Yankees
Roberto Kelly played with David Ortiz for the 1997 Minnesota Twins
Give this Oracle a try too if you enjoy baseball. The terms Bacon Number and Erdős number are based on similar linking numbers in movie and math publication contexts. For more insanity, read about the Erdős-Bacon Number.
Your final lab will ask you to implement a basic version of the Oracle of Bacon or the Oracle of Baseball. This week, you will be asked to start thinking about this project in an even more limited context.
Link Queries
For this week's lab, you will design one or more classes such that you can parse a data file and answer the following queries on movie data or baseball data (movie data shown below).
- Given an Actor/Actress, report all movies that he/she appeared in.
- Given a movie, year, report all actors/actresses in that movie.
The basic approach is to use your
Graph class and possibly a few auxillary
Dictionaries. The vertices in your graph should store the names of actresses, actors, or movies. Edges in the graph link performers to movies but otherwise carry no additional info. To simplify the implementation you can treat both types of vertex info types as Strings and simply concatenate the movie/year or first name/last name pairs.
For example, Fargo (1996) can be represented as the key "Fargo 1996".
Data Files
For the baseball data, there is only one file in
/usr/local/doc/BaseballLinks.txt. Each line represents one record with
the following tab delimited fields: player ID, first name, last name, year,
team name, team abbreviation. A completely random ;) example is shown below:
vanslan01 Andy Van Slyke 1991 Pittsburgh Pirates PIT
For the movie data, there are a number of files, each with the same format, but a different number of records. Each line represents one record with the following tab delimited fields: first name, last name, movie title, year. A
sample is shown below:
Diedrich Bader Miss Congeniality 2: Armed & Fabulous 2005
Because the full movie data file is over 100 MB, I have made some shorter versions. All are in /usr/local/doc directory
- Bacon0Links.txt - All records for only Kevin Bacon (i.e., movies he appeared in). (68 records, 2.2KB)
- Bacon1Links.txt - All records for Kevin Bacon and his co-stars (i.e., all records that feature someone that acted with Kevin Bacon on some movie). (38315 records, 1.4MB)
- Bacon2Links.txt - All records for Kevin Bacon, his co-stars, and his co-stars co-stars (1,693,497 records, 61MB)
- BaconModernPopLinks.txt - All records for actors that have appeared in at least 10 movies since 1970 (995,888 records, 36MB)
- BaconAllLinks.txt - All records except for TV show appearances or direct release to video (4,074,801 records, 146MB)
Requirements
Your code should be able to answer the following queries for the data of your choice (baseball or movies, you do not need to support both). In the example below, I will refer to the movie data.
- Build an appropriate Graph object to store all the data.
- Given an Actor/Actress, report all movies that he/she appeared in.
- Given a movie, year, report all actors/actresses in that movie.
- Write a short main program that tests your implementation.
- Allow an interactive option where users can repeatedly enter names of Actors/Actresses and see a list of movies in which they appeared. You should gracefully handle the case when a user types in an actor that is not in the database, but you do not need to handle the case where a user misspells a name or disambiguate names when more than one actor has the same name. Note that if you read in the user's menu choice as
an integer, then the newline that follows their choice will still be
sitting in the input stream. In order to dispose of this you can do
cin.get() before you try to read in the next string.
Remember that you can use cin.getLine(...) to read in a
string containing spaces. You can read more about the istream interface using the online documentation.
Classes
A number of classes are provided for you in
update35 including a start to the
Graph class. Most of your work can be done in
oracle.cpp, though you may need to modify the
Graph and
Vertex classes to add functionality. Please do not remove functionality. You can use all the other classes provided, but you should not need to modify them.
About the Data
The data for the baseball player info is used in the site
baseball-reference and is available in an alternate (more verbose) format at
baseball-databank.org.
The movie data comes from The Internet Movie Database and is available in an alternate (way more huge) format as part of their alternate interfaces. The data has been cleaned up a bit to remove TV shows and a number of unsavory direct to video releases (e.g., "The Land Before Time XIII: The Wisdom of Friends").