CS35: Homework #10
You may work with a partner on this assignment.
Your Webbrowser will start by taking two or three command line arguments: the first is a start_url, the second is link_depth (the depth you will follow any link from the start_url), and the third is an optional html_ignore file. For example, you may run your program like this:
% java WebBrowser www.cs.swarthmore.edu/~usrname 5 html_ignore_fileThe list of vertices in your hyperlink graph, will then be used to create all the data structures needed by your homework#9 solution. For each url, you will create a URLContent object by parsing its file and create its WordFrequency tree. You should be able to just make a few modifications to your ProcessQueries class to get this to work.
You will use the graph in two ways:
We implemented the Graph Window GUI for you, you just need to add a button to your WebBrowser to pop-up the Graph Window.
We will give you an HREFScanner class that will take care of all the ugly parsing of url links for you. It works by first initializing it with a url, then it will open the url's file, scan its webpage for href tags and return the next valid url link from the scanned page when its getNextToken method is called.
Finally, you are required to create a webpage in
~your_username/public_html/index.html
that contains:
~your_usrname/public_html/index.html
that contains
links to at least 3 other cs35 students so that you can easily test your link
graph code.
A sample webpage that has both absolute and relative links is available here:
cp ~newhall/public/cs35/hw10/index.html . cp ~newhall/public/cs35/hw10/temp.html .A list of user names of all CS35 students:
budish heckel jbeaure1 lum ray ryan emily immonje kara rlewis stober tynan gerald irie laurel perini rodgers thomas wardIn addition, here is a breakdown of the specific changes you will need to make to your existing code (again, the exact changes you may need to make will vary based on your specific implementation):
Change ProcessQueries --------------------- (*) add a graph data member (*) add a constructor that takes a start url and link_depth limit (and optionally an html_ignore_list) as input, and (1) creates a link graph (following links only link_depth deep from the start url) (2) creates URLContent list and cache as before (3) incorporates linked-to information into determining a URL's priority when ordering query results Graph ----- (*) implement the shortestPath method you can test your shortestPath method before you have other parts of the program working. Just create a weighted directed graph in the main method of TryGraph.java and call your shortestPath method on different start vertices. Even though all the edges in the link-graph will have a weight value of one, your implemenation should be more general and should work correctly on any weighted directed graph. WebBrowser --------- (*) add Graphics button that pops up a graphics window (makes a GraphGui object visible) WebPage ------- (*) add links to three other class member's homepages (*) add final write up information
If you implement the 10 point Extra Credit part, then please submit your extra credit part as a separate solution (i.e. submit one regular solution, and one extra credit solution) so that we can test your link-graph on local webpages only.
If you implement one or more extra credit features, be sure to include a description of the feature and how to test it in your webpage write-up.
Your webpage will be similar to documentation that you would write to tell users how to user your software (how to run it and how to use all its features). Make sure that your webpage also includes the answers to the specific questions I asked above.
For this assignment you do not need to use cs35handin. Instead I will look at your webpage and you will put your solution .class files in a directory that I can access and run.
First you should re-compile all your .class files without the -g option. Next, create a directory called finalproject in your home directory and copy all your .class files (NOT .java files) into it. Include your html_ignore file and a README file telling me how to run your program. Do not modify these .class files after the due date of the assignment.
Set the permissions on this directory by doing:
chmod 755 finalprojectSet the permissions on all the files in your finalproject directory by doing (from your finalproject directory):
chmod 644 *Finally, make sure that your webpage is complete and is readable (try loading it in netscape to make sure that the permissions are set correctly).