You are strongly encouraged to work with a partner on this project.
Link Graphs
You will be creating graphs that either link movie actors/actresses between
movies they co-starred in or link baseball players that were teammates at some
time.
For the baseball data, there is only one file in
/usr/local/doc/BaseballLinks.txt. Each line represents one record with
the following tab delimited fields: player ID, first name, last name, year,
team name, team abbreviation. A completely random ;) example is shown below:
vanslan01 Andy Van Slyke 1991 Pittsburgh Pirates PIT
For the movie data, there are a number of files, each with the same format, but a different number of records. Each line represents one record with the following tab delimited fields: first name, last name, movie title, year. A
sample is shown below:
Diedrich Bader Miss Congeniality 2: Armed & Fabulous 2005
Because the full movie data file is over 100 MB, I have made some shorter versions. All are in /usr/local/doc directory
- Bacon0Links.txt - All records for only Kevin Bacon (i.e., movies he appeared in). (57 records, 1.9KB)
- Bacon1Links.txt - All records for Kevin Bacon and his co-stars (i.e., all records that feature someone that acted with Kevin Bacon on some movie). (25324 records, 898KB)
- Bacon2Links.txt - All records for Kevin Bacon, his co-stars, and his co-stars co-stars (424616 records, 16MB)
- BaconModernPopLinks.txt - All records for actors that have appeared in at least 10 movies since 1970 (747497 records, 27MB)
- BaconAllLinks.txt - All records except for TV show appearances or direct release to video (3,146,321 records, 113MB)
Link graph
Design and implement a graph interface that can support the following queries for the data of your choice (baseball or movies, you do not need to support both). In the example below, I will refer to the movie data.
- Given an Actor/Actress, report all movies that he/she appeared in.
- Given a movie, year, report all actors/actresses in that movie.
- Given two Actors/Actresses, determine if they ever co-starred in a movie.
- Given two Actors/Actresses, display (if one or more exist) one of the shortest chains from one Actor to another. If no chain exists, inform the user
- For a given Actor/Actress, report an Actor or Actress whose shortest path is finite and maximal. In other words, given "Kevin Bacon" as a query, who is the least linkable to "Kevin Bacon" and how long is his/her path? Note the solution may not be unique. You just have to report one candidate.
- Compute the average linkability of an Actor/Actress. This is done by accumulating the shortest distance from each Actor/Actress to a given source and then dividing by the total number of people in the connected component.
Requirements
- You must be able to build a graph on either the baseball data or Bacon2Links.txt. You are welcome to conduct initial tests on smaller data or work with the full data set.
- Write a sample test class that tests the main features of your code
- Write an interactive class that prompts the user to enter two Actors/Actresses and displays a shortest path between them if such a path exists, or successfully handles people that cannot be connected or do not exist in the database.
Submitting
Along with your Java source code, you should hand in a README file. These files will be
imported automatically via handin35. Your README should summarize the main components of your graph interface and its implementation (don't get too technical, that's what the code is for). Also include run-time analysis of the key methods in your interface.