Note the due date is a Monday. I encourage you to work with a partner as the concepts can be quite difficult to tackle on this lab.
The Oracle of Bacon searches a movie database to find a path between Kevin Bacon and other actors using a graph defined by the co-star relationship: two actors are connected by an edge if those two actors co-starred in a movie together. An actor's Bacon number is the length of the shortest path between that actor and Kevin Bacon. For example, the Oracle of Bacon might report that Kevin Bacon is connected to Kurt Russell with a length-2 path:
Kurt Russell was in Mean Season, The (1985) with Andy Garcia Andy Garcia was in Air I Breathe, The (2007) with Kevin Bacon Kurt Russell has a Kevin Bacon number of 2
Are you a movie buff? Try guess the Bacon Number yourself and test it out. The number of movies you need to report is the link number. Can you find two movie stars with a link number greater than 3? Greater than 5? It is harder than it sounds.
This game extends to other actors. For example, we could ask what the Giant number is for Andre the Giant and any other actor, say Justin Bieber.
Justin Bieber was in School Gyrls (2009) with Fred Willard Fred Willard was in I Could Never Be Your Woman (2007) with Wallace Shawn Wallace Shawn was in Princess Bride, The (1987) with Andre the Giant Justin Bieber has a Andre the Giant number of 3
In this lab you will first implement several graph algorithms and then use the Graph and other data structures from this course to respond to queries about movies and actors.
As usual, you can get the initial files for this lab by running update35. The new files are listed here, and the files that you will need to edit are highlighted in blue:
You should start by reviewing our implementation of the Graph class. Our Graph uses an edge-list representation: for each vertex we store a list (an STL vector, actually) of the outgoing edges from that vertex. The adjacency-list representation for all vertices - that is, edges are stored as part of a vertex object so that we can access all outgoing edges for a single vertex efficiently. Our implementation uses a HashTable, with vertices as the keys and their edge-lists as the values. If the details are difficult to follow, you should at least be able to follow the data structure interface. You will not need to implement any Graph methods, but you will need to understand how to use them. Before moving on, create a Graph object in your testGraph.cpp file.
Please complete each function marked with a TODO comment in graph-algorithms.inl. These standard graph search algorithms will help you implement your Oracle of Bacon program later on. Before moving forward, you must test each function you implement using the testGraph program.
Feel free to add additional methods, although you will need to ensure you add their declaration in the header file. At a minimum, your functions are:
As an example, you could construct a graph to represent our search problem of trying to get from Parrish Halle to The Ville (way back in week 5). If you do this correctly, your test may look like this:
$ ./testGraph Let's build a map of Swarthmore! The graph: Softball: {Ville, Soccer, Tunnel, Train} Bridge: {Tennis, Wharton} Sharples: {Clothier, Tunnel} Soccer: {Softball, Tennis} Wharton: {Bridge, Clothier} Magill: {Parrish, Train} Tennis: {Soccer, Bridge, Tunnel} Parrish: {Magill, Clothier} Train: {Magill, Ville, Softball} Ville: {Train, Softball} Tunnel: {Softball, Sharples, Tennis} Clothier: {Wharton, Parrish, Sharples} How do I get from Parrish to the Ville? Using DFS: Parrish Clothier Sharples Tunnel Softball Ville Using BFS: Parrish Magill Train Ville The shortest path length is: 3 The average distance from Parrish to all other locations is: 2.54545
Your main work for this lab is to implement oracle.cpp. This program should first read a movie database file, then provide an interactive menu that allows the user to request information from your movie database. Specifically, your program should support the following queries:
Your program should allow the user to repeatedly select a menu option until they select a "Quit" option or type CTRL-D. When possible, you should gracefully handle incorrect input such as when the user enters an actor or movie name not in the database.
The data files for this lab are in the /usr/local/doc directory, accessible from the CS lab computers. Some of these files are moderately large. We strongly urge you to test your program using the smaller files , and only use the larger files when you are confident that your program works. The files are:
Kevin Bacon A Few Good Men 1992where each field (name, title, year), is separated by a tab.
You might find the following hints more or less helpful:
totalDistance = 0.0 numVertices = 0.0 component = Use BFS to find all vertices in v's component For each vertex u in component: totalDistance += distance(v, u) numVertices += 1 return totalDistance / numVerticesThis is needlessly inefficient because many computations of distance(v,u) might each search O(n) other vertices. (Given that, what is the total running time of this pseudocode?) Instead, you should directly compute the distance between v and each other vertex during a single breadth-first search of the graph.
Graph <string, string, int>* graph = new Graph <string, string, int> graph->insertVertex("Parrish"); //do some more inserts, code is looking good! //now let's do something interest bfs("Parrish","Ville", graph); //Compiler errorC++ interprets "Parrish" as a c-string and doesn't like that you are trying to send that in for a string. Simply modify your call by casting it as a string explicitely.
bfs(string("Parrish"), string("Ville"), graph);
As an extension to the Oracle game, implement Djikstra's algorithm as a
W getDistance(V src, V dest, Graph
A similar concept is the Oracle of Baseball which links baseball players through common teams. For example, linking Babe Ruth to David Ortiz reports:
Babe Ruth played with Ben Chapman for the 1930 New York Yankees Ben Chapman played with Early Wynn for the 1941 Washington Senators Early Wynn played with Tommy John for the 1963 Cleveland Indians Tommy John played with Roberto Kelly for the 1988 New York Yankees Roberto Kelly played with David Ortiz for the 1997 Minnesota Twins
Give this Oracle a try too if you enjoy baseball. The terms Bacon Number and Erdős number are based on similar linking numbers in movie and math publication contexts. For more insanity, read about the Erdős-Bacon Number.
Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt. You and your partner should handin just one copy of your lab.
You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.
The movie data comes from The Internet Movie Database and is available in an alternate (way more huge) format as part of their alternate interfaces. The data has been cleaned up a bit to remove TV shows and a number of unsavory direct to video releases (e.g., "The Land Before Time XIII: The Wisdom of Friends").