CS35: Homework #9
You may work with one partner on this assignment.
Part 1 of the assignment is building a GUI front-end to your search engine that also performs the fetching and displaying URLs job of a web browser. Part 2 is adding caching of query results to your search engine, and speeding up the execution of queries by using the cached results of previous queries.
The solution that you submit should link the two parts together. However, you can start working on each part independently. When you add the second part to the GUI Web browser, print the caching information to the search results window as well as to stdout.
Build a GUI front-end to your search engine. It should have the following 6 GUI components:
Your WebBrowser's main method will start up by taking a url_list and ignore_list command line options, create a ProcessQueries object (from hw#7), and then create the GUI front-end.
The WebBrowser GUI will work like the following:
Below is a demonstration of how your more efficient query processing program should behave. You will save all previous query results in a data structure called the cache. When the user types in a query, you will first check the cache to see if you have processed a similar request in the past.
##################################### Enter next Query String or -1 to quit ##################################### artificial Results for your Query "artificial": ----------------------------------------------- Query: artificial Query Result NOT Found in cache <-------- print out whether or not query is in cache 0 words from query found in cache <-------- if not, print out how many words of the query are in the cache Matching URLs: www.cs.swarthmore.edu/~meeden priority: -8 ##################################### Enter next Query String or -1 to quit ##################################### artificial intelligence Results for your Query "artificial intelligence": ----------------------------------------------- Query: artificial intelligence Query Result NOT Found in cache 1 words from query found in cache Matching URLs: www.cs.swarthmore.edu/~meeden priority: -17 ##################################### Enter next Query String or -1 to quit ##################################### intelligence Results for your Query "intelligence": ----------------------------------------------- Query: intelligence Query Result Found in cache Matching URLs: www.cs.swarthmore.edu/~meeden priority: -9
Notice that the first time the user asks about "artificial" we cannot find anything in the cache. But the next time the user asks about "artificial intelligence" we can grab the stored information about "artificial" and perform a search of WordFrequencyTrees for just the "intellignece" part. When the query "intelligence" is entered, its result simply can be obtained from the stored informatation in the cache. Also note that we want to be able to recognize that a query is the same regardless of the word order, so "artificial intelligence" is the same as "intelligence artificial".
http://www.javasoft.com/docs/books/tutorial/uiswing/components/simpletext.htmlGetting your GUI to display url files and to display search result strings is easy if you use the JEditorPane's setPage and setText methods. Remember to import javax.swing.text.* and java.net.URL along with the standard Swing imports.
You will probably want to use the GridBagLayout to layout the buttons and the text display in the same panel. This will likely give the best results when the Window is re-sized (you want the display parts of the window to re-size vertically, but probably not the buttons and the search text and url text boxes).
To get the query result to print out nicely in the display, you need to convert the query results to a string (either in html or ASCII). One way to do this is to add a queryToString method (or a queryToHTMLString method) of the ProcessQuery class that takes as input the result returned by the performQuery method and converts the results to an string generating "\n" for new lines or < br > for new lines. You can then call setText on the JEditorPane to display the string.
The cache is represented as a hashtable where the key is the query string. The element stored with the query should be an instance of a new class called SavedResult. A SavedResult contains another hash table storing URLs with their count and a String answer.
hashtable cache ------------------------------------------------------------------------- |key: | | | | | | | | | String | | | | | | | | |element: | | | | | | | | | SavedResult | | | | | | | | | | | | | | | | | | | | | | | | | | | -------------------|----------------------------------------------------- | \/ SavedResult --------- ---------------------------------------- | table |---->| URL, count | | | | | | --------- ---------------------------------------- | answer|----> String containing query result, or null if --------- this entry was created to satisfy a larger | best | query of which this is a part | match | | URL |----> The best matching URL as a string (this is for --------- automatically displaying the best matching URL in your Web browser gui) 1. For each word in query, create an alphabetically ordered query string that is all one case and the same case as you stored tokens into your WordFrequencyTrees 2. Search for the query in the cache (a) if a match is found check answer field of entry if null, then create answer based on table return result and add best matching URL string to this entry as well (b) if a match is not found (A) create a hashtable for the query string for each word in the query (1) if it is in the cache then get its hashtable and merge its contents with the hashtable for the query (2) otherwise (a) create an entry for the word by processing the WordFrequencyTrees for the word and creating a hashtable of (URL, count) entries hashed on URL (the result field is null) (b) merge this hashtable with the hashtable for the query (B) create a result PQ from the entries in the query's hashtable (priority is related to count field of URL), then use this priority queue to create a String representation of the query answer (C) add the (query, (hashtable, answer)) to the cache (D) return result to caller 3. After each query print out: (a) whether or not the query was found in the cache (b) if not, the number of query words that were found in the cacheThe best place to implement caching is in your ProcessQueries class. However, if you use my ProcessQueries class then you will need to implement another class to perform the caching and make calls to my ProcessQueries class methods.
Think carefully about which implementation of the HashTable you plan to use for your cache: does it make a difference?
Our solution .class files can be copied from:
~newhall/public/cs35/hw07/solution/
Documentation for all of these classes is in
~newhall/public/cs35/hwdoc/
.
~newhall/public/cs35/hw09/solution/To run:
java WebBrowser url_list
or: java WebBrowser url_list html_ignorelist
Try entering artificial intelligence
in the search box and
http://www.cs.swarthmore.edu/~cfk
in the Fetch URL box and
click on the corresponding buttons and see what happens.
Note: the links on the displayed webpage don't work. You do not have to add support to make links work. However, there is a way to add this functionality, and you can try it out if you'd like.