PROBLEM DESCRIPTION
For this program you will add a graph of URL links to your Web browser.
You will create the graph by
parsing a starting url's file and finding href links of other local webpages,
and parsing them, and so on. The graph will contain a vertex for each url,
and an edge (u, v) if there is a link from url u to url v.
The graph can be added as a data member to your ProcessQueries or WebBrowser
class depending on how you implemented your solution to homework #9.
Your Webbrowser will start by taking two or three command line arguments:
the first is a start_url, the second is link_depth (the depth you will follow
any link from the start_url), and the third is an optional html_ignore file.
For example, you may run your program like this:
% java WebBrowser www.cs.swarthmore.edu/~usrname 5 html_ignore_file
The list of vertices in your hyperlink graph, will then be used to create all the
data structures needed by your homework#9 solution. For each url, you will create
a URLContent object by parsing its file and create its WordFrequency tree.
You should be able to just make a few modifications to your ProcessQueries class
to get this to work.
You will use the graph in two ways:
- You will create a "Graph Window" button and add it to your Web browser.
When this button is selected it will bring up a new window with the following
options:
- A display window to print results of button actions
- A "Reachable From" button and text window: when a url is entered in
the text window, and the button is selected, it calls the reachableFrom
method of your graph and prints out the results to the display window.
- A "Shortest Path" button and text window: when a url is entered in the
text window, and the button is selected, it calls the shorestPath method
of your graph and prints out the results of the shortest path from the url
to all other urls reachable from it. You are required to implement the
shortestPath method of the Graph class.
- A "Print Graph" button: when selected, the graph is printed to the
display window.
We implemented the Graph Window GUI for you, you just need to add a button to
your WebBrowser to pop-up the Graph Window.
- You will add linkage information to compute a good search result:
If a page that matches a query string is linked-to by many other pages its
priority should be increased some amount based on its linked-to degree.
You should use this new criterion combined with word frequency count information
to order query results.
We will give you an HREFScanner class that will take care of all the ugly
parsing of url links for you. It works by first initializing it with a url,
then it will open the url's file, scan its webpage for href tags and
return the next valid url link from the scanned page when its getNextToken
method is called.
Create a web page describing your browser/search engine
Finally, you are required to create a webpage in
~your_username/public_html/index.html that contains:
- At least three links to other cs35 students homepages.
- A write-up summarizing the features of your web browser and search engine
that describes the following:
- how each feature works by providing an example
- how you used linkage information in prioritizing urls in your search engine
- some ways in which your search engine could be made more efficient
(in space and/or time)
- some ways in which you could make the search engine
smarter (choose "good matches" in a better way)
If you work with a partner, then both of you should create webpages with links
to three other students. The write-up should appear on only one page. The
other page should link to it.
GETTING STARTED
First, create a webpage in a file ~usrname/public_html/index.html that contains
links to at least 3 other cs35 students so that you can easily test your link
graph code.
A sample webpage that has both absolute and relative links is available here:
cp ~newhall/public/cs35/hw10/index.html .
cp ~newhall/public/cs35/hw10/temp.html .
A list of user names of all CS35 students:
chamberl herriot mates rivero turetsky yost
chavez kerensky messing sosin woodwort zhiyuan
feng klock metrick szafran yeelin
gibbard mason osheim yoo
In addition, here is a breakdown of the specific changes you will need to make
to your existing code (again, the exact changes you may need to make will
vary based on your specific implementation):
Change ProcessQueries
---------------------
(*) add a graph data member
(*) add a constructor that takes a start url and link_depth
limit (and optionally an html_ignore_list) as input, and
(1) creates a link graph (following links only link_depth deep from the
start url)
(2) creates URLContent list and cache as before
(3) incorporates linked-to information into determining a URL's priority
when ordering query results
Graph
-----
(*) implement the shortestPath method
you can test your shortestPath method before you have other parts of the
program working. Just create a weighted directed graph in the main
method of TryGraph.java and call your shortestPath method on different
start vertices. Even though all the edges in the link-graph will have a
weight value of one, your implemenation should be more general and should
work correctly on any weighted directed graph.
WebBrowser
---------
(*) add Graphics button that pops up a graphics window
(makes a GraphGui object visible)
WebPage
-------
(*) add links to three other class member's homepages
(*) add final write up information
Extra Credit
Do not start on extra credit parts until you have completed the regular
part of the assignment
You can receive up to 15 points worth of extra credit on this assignment by
implementing one or more of the following extensions:
- [5 pts]: Add a Back, Forward, and Home Button to your Webbrowser.
Add data structures to keep track of your browsing history, and "Back", "Forward"
and "Home" buttons that use this data structures to fetch and display the
correct webpage. You will likely want to put a limit on the history depth
(15 or 20 previous pages seems pretty good) so that this data structure doesn't
get too large.
- [5 pts]: Add support for enabling links on displayed webpages and
listing search results as links.
Clicking on a link on a displayed webpage should result in the link's
webpage being fetched and displayed by your web browser. In addition, your
list of matching urls from your search engine should be displayed as a list
of links to webpages (use href). Clicking on one of the links listed as
a search result will result in the webpage being displayed in your URL display
window.
- [10 pts]: Add support for building a link graph of any (not just local)
webpages. To do this you will need to change your link parsing code
so that you are parsing the contents of the URL display part of your web
browser rather than opening and parsing local files. Start by displaying
the default url given as a command line argument, get the contents of the
text display GUI and parse it for hrefs with an "http://" prefix. For each
such link that you find, try to fetch the webpage, then if it was
successfully fetched, and a node and an edge to your graph and parse its
contents for links. If you implement this, you will want to keep your depth
value pretty small.
Extra credit is graded on an all or nothing basis: either the feature completely
works or it doesn't, I will not give partial credit for partially implemented
extra credit features.
If you implement the 10 point Extra Credit part, then please submit your extra
credit part as a separate solution (i.e. submit one regular solution, and one
extra credit solution) so that we can test your link-graph on local webpages
only.
If you implement one or more extra credit features, be sure to include a
description of the feature and how to test it in your webpage write-up.
TURNING IT IN
For this assignment you will submit your web browser as if you are releasing
it as software for others to download and use; you are just releasing the .class
files, but keeping the source (.java files) private.
Your webpage will be similar to documentation that you would write to
tell users how to user your software (how to run it and how to use all its
features). Make sure that your webpage also includes the answers to the
specific questions I asked above.
For this assignment you do not need to use cs35handin. Instead I will look at
your webpage and you will put your solution .class files in a directory that I
can access and run.
First you should re-compile all your .class files without the -g option.
Next, create a directory called finalproject in your home directory and copy
all your .class files (NOT .java files) into it.
Include your html_ignore file and a README file telling me how to run your
program. Do not modify these .class files after the due date of the
assignment.
Set the permissions on this directory by doing:
chmod 755 finalproject
Set the permissions on all the files in your finalproject directory by doing
(from your finalproject directory):
chmod 644 *
Finally, make sure that your webpage is complete and is readable (try
loading it in netscape to make sure that the permissions are set correctly).