CS35 Project: Part 4


Part4: Adding a hyperlink graph to the Web browser

CONTENTS


PROBLEM DESCRIPTION

For this program you will add a graph of URL links to your Web browser. You will create the graph by parsing a starting url's file and finding href links of other local webpages, and parsing them, and so on. The graph will contain a vertex for each url, and an edge (u, v) if there is a link from url u to url v. The graph can be added as a data member to your ProcessQueries or WebBrowser class depending on how you implemented your solution to homework #9.

Your Webbrowser will start by taking two or three command line arguments: the first is a start_url, the second is link_depth (the depth you will follow any link from the start_url), and the third is an optional html_ignore file. For example, you may run your program like this:

	% java WebBrowser www.cs.swarthmore.edu/~usrname 5 html_ignore_file
The list of vertices in your hyperlink graph, will then be used to create all the data structures needed by your homework#9 solution. For each url, you will create a URLContent object by parsing its file and create its WordFrequency tree. You should be able to just make a few modifications to your ProcessQueries class to get this to work.

You will use the graph in two ways:

  1. You will create a "Graph Window" button and add it to your Web browser. When this button is selected it will bring up a new window with the following options:
    • A display window to print results of button actions
    • A "Reachable From" button and text window: when a url is entered in the text window, and the button is selected, it calls the reachableFrom method of your graph and prints out the results to the display window.
    • A "Shortest Path" button and text window: when a url is entered in the text window, and the button is selected, it calls the shorestPath method of your graph and prints out the results of the shortest path from the url to all other urls reachable from it. You are required to implement the shortestPath method of the Graph class.
    • A "Print Graph" button: when selected, the graph is printed to the display window.

    We implemented the Graph Window GUI for you, you just need to add a button to your WebBrowser to pop-up the Graph Window.

  2. You will add linkage information to compute a good search result: If a page that matches a query string is linked-to by many other pages its priority should be increased some amount based on its linked-to degree. You should use this new criterion combined with word frequency count information to order query results.

We will give you an HREFScanner class that will take care of all the ugly parsing of url links for you. It works by first initializing it with a url, then it will open the url's file, scan its webpage for href tags and return the next valid url link from the scanned page when its getNextToken method is called.

Create a web page describing your browser/search engine

Finally, you are required to create a webpage in ~your_username/public_html/index.html that contains:
  1. At least three links to other cs35 students homepages.
  2. A write-up summarizing the features of your web browser and search engine that describes the following:
    • how each feature works by providing an example
    • how you used linkage information in prioritizing urls in your search engine
    • some ways in which your search engine could be made more efficient (in space and/or time)
    • some ways in which you could make the search engine smarter (choose "good matches" in a better way)
If you work with a partner, then both of you should create webpages with links to three other students. The write-up should appear on only one page. The other page should link to it.

GETTING STARTED

First, create a webpage in a file ~usrname/public_html/index.html that contains links to at least 3 other cs35 students so that you can easily test your link graph code. A sample webpage that has both absolute and relative links is available here:
	cp ~newhall/public/cs35/hw10/index.html .
	cp ~newhall/public/cs35/hw10/temp.html .
A list of user names of all CS35 students:
chamberl  herriot   mates     rivero    turetsky  yost
chavez    kerensky  messing   sosin     woodwort  zhiyuan
feng      klock     metrick   szafran   yeelin
gibbard   mason     osheim    yoo
In addition, here is a breakdown of the specific changes you will need to make to your existing code (again, the exact changes you may need to make will vary based on your specific implementation):
Change ProcessQueries
---------------------
(*) add a graph data member
(*) add a constructor that takes a start url and link_depth 
    limit (and optionally an html_ignore_list) as input, and 
    (1) creates a link graph (following links only link_depth deep from the 
	start url)
    (2) creates URLContent list and cache as before
    (3) incorporates linked-to information into determining a URL's priority
        when ordering query results

Graph
-----
(*) implement the shortestPath method
    you can test your shortestPath method before you have other parts of the
    program working.   Just create a weighted directed graph in the main 
    method of TryGraph.java and call your shortestPath method on different
    start vertices.  Even though all the edges in the link-graph will have a 
    weight value of one, your implemenation should be more general and should
    work correctly on any weighted directed graph.

WebBrowser
---------
(*) add Graphics button that pops up a graphics window
    (makes a GraphGui object visible)  

WebPage
-------
(*) add links to three other class member's homepages
(*) add final write up information

Java Classes

You should start with your solution to homework #9. In addition we provide the following classes (available from ~newhall/public/cs35/hw10/classes/):
  1. HREFScanner.java: a scanner class whose constructor takes a url, and parses the url's file returning href links to possibly valid local webpages as the next token.

  2. GraphGui.java and RunGraphGui.java to test it: a Swing Gui for displaying information about your link graph. RunGraphGui, shows you how you can add a button to your Web browser to pop-up the GraphGui window.

  3. Graph.java, Vertex.java, Edge.java, Weighted.java, WeightedInteger.java, Locator.java, HeapLocatorPriorityQueue.java, LocatorPriorityQueue.java, InvalidVertexLabelException.java: Classes necessary to run the incomplete implementation of the Graph class. You will need to implement the shortestPath method.

Extra Credit

Do not start on extra credit parts until you have completed the regular part of the assignment
You can receive up to 15 points worth of extra credit on this assignment by implementing one or more of the following extensions: Extra credit is graded on an all or nothing basis: either the feature completely works or it doesn't, I will not give partial credit for partially implemented extra credit features.

If you implement the 10 point Extra Credit part, then please submit your extra credit part as a separate solution (i.e. submit one regular solution, and one extra credit solution) so that we can test your link-graph on local webpages only.

If you implement one or more extra credit features, be sure to include a description of the feature and how to test it in your webpage write-up.


TURNING IT IN

For this assignment you will submit your web browser as if you are releasing it as software for others to download and use; you are just releasing the .class files, but keeping the source (.java files) private.

Your webpage will be similar to documentation that you would write to tell users how to user your software (how to run it and how to use all its features). Make sure that your webpage also includes the answers to the specific questions I asked above.

For this assignment you do not need to use cs35handin. Instead I will look at your webpage and you will put your solution .class files in a directory that I can access and run.

First you should re-compile all your .class files without the -g option. Next, create a directory called finalproject in your home directory and copy all your .class files (NOT .java files) into it. Include your html_ignore file and a README file telling me how to run your program. Do not modify these .class files after the due date of the assignment.

Set the permissions on this directory by doing:

	chmod 755 finalproject
Set the permissions on all the files in your finalproject directory by doing (from your finalproject directory):
	chmod 644 *
Finally, make sure that your webpage is complete and is readable (try loading it in netscape to make sure that the permissions are set correctly).