A skeleton version of the programs will appear in your
cs35/lab/10 directory when you run update35. The
program handin35 will only submit files in this
directory.
You are encouraged to work with a partner on this lab.
Introduction
Expand your previous lab to find paths between two actors/actresses.
The Oracle of Bacon searches a
movie database to link Kevin Bacon to other actors by a
path of movies and co-stars that appeared in those movies. For
example, if you wanted to link Kurt Russell with Kevin Bacon, the
Oracle might report:
Kurt Russell was in Mean Season, The (1985) with Andy Garcia
Andy Garcia was in Air I Breathe, The (2007) with Kevin Bacon
Kurt Russell has a Kevin Bacon number of 2
This linking concept can be extended to any two actors. For example,
if you wanted to link Reese Witherspoon with John Wayne, the Oracle
might report:
Reese Witherspoon was in Rendition (2007) with Alan Arkin
Alan Arkin was in America at the Movies (1976) with John Wayne
Reese Witherspoon has a John Wayne number of 2
To implement the search for links between actors in the
movie database you will use a number of the data structures and
techniques that we've explored throughout this course, including
lists, queues, dictionaries, and breadth-first search.
Requirements
Your program should provide an interactive menu that allows the user
to request the following information from the database:
- Given an actor's name, report all the movies that he
or she appeared in.
- Given a movie title and year, report all the actors
that appeared in the movie.
- Given two actor's names, report the shortest chain that links
them. If no chain exists, then inform the user.
- Compute the average linkability of an Actor/Actress. This is done by accumulating the shortest distance from each Actor/Actress to a given source and then dividing by the total number of people in the connected component.
Your program should handle incorrect information gracefully such as
when the actor name or movie title entered is not in the database.
Data Files
The data for this lab can be found in the directory
/usr/local/doc/. The complete set of data is over 100MB. A
number of trimmed down versions of the data are available. I strongly
urge you to initially test your program on the smaller versions before
attempting the larger ones. At a minimum your program should be able
to handle the two links version of the data.
- Bacon0Links.txt
All records for only the movies that
Kevin Bacon appeared in (57 records, 1.9KB)
- Bacon1Links.txt
All records for Kevin Bacon and his
co-stars (25324 records, 898KB)
- Bacon2Links.txt
All records for Kevin Bacon, his
co-stars, and his co-stars co-stars (424616 records, 16MB)
- BaconModernPopLinks.txt
All records for actors that
have appeared in at least 10 movies since 1970 (747497 records,
27MB)
- BaconAllLinks.txt
All records except for TV show
appearances or direct release to video (3,146,321 records, 113MB)
Getting Started
Use your files form
lab 10a as a starting point.
- Reading/parsing the text file and building the initial graph takes time. Only do this once and pass pointers to the constructed graph object if you need to use the graph in multiple places. Use resetAll to prepare the graph for additional searches.
- The BFS code initially identified (1) which vertices were visited from a source and (2) the distance to the visited vertices. To report a path between two performers, you will need to retrace a path in the BFS. One possible way is to add an additional field to the Vertex class that maintains for each vertex v the ID of the vertex that discovered v first in the BFS. By tracing these IDs back from the destination vertex, you should be able to report the path. Be sure to reset this parent before each new BFS. Note by following the path from destination to source, you are visiting the path backwards. Use an appropriate data structure (perhaps a built-in one) to reverse the path for printing.
- If you update a class instance variable when doing the BFS, you can easily support the average linkability query in the Graph class.
Optional feature
If you'd like an additional challenge, try this optional
feature. For a given actor, report another actor whose path is finite
and maximal. In other words, given Kevin Bacon as a query, who is the
least linkable to Kevin Bacon and how long is his/her path? Note the
solution may not be unique. You just have to report one candidate.
Submit
Once you are satisfied with your code, hand it in by typing
handin35. This will copy the code from your
cs35/lab/10 to my grading directory. You may run
handin35 as many times as you like, and only the most recent
submission will be recorded. If you worked with a partner, only one of
you needs to handin the code.