A CS2 1/2 Semester Course Project
Building a Web Browser and Search Engine in Java:
a 1/2 Semester Project for a CS2-type Course
INTRODUCTION
This page describes a project that we use in the
second half of our Data Structures course (CS35). The project is a web browser
with a search engine implemented in Java.
We had several pedagogical goals in designing this project
and we found that this project easily met these goals. These goals include:
- reinforcing the data structures and algorithms presented in
lecture.
- providing a large, real-world application that students found
challenging and fun to implement.
- demonstrating the importance of complexity analysis in
determining performance efficiency when considering design
alternatives.
- illustrating the benefits of software engineering techniques.
- providing an opportunity for students to work in teams.
We discuss our experiences using this project in the following paper
from the Proceedings of the 33rd ACM Technical Symposium on
Computer Science Education: Paper(PDF),
Talk Slides (gziped postscript)
PROJECT DESCRIPTION
The project consists of four separate assignments that incrementally build
a web browser with a search engine. Each assignment adds to
or improves on functionality implemented by the previous part by applying
new algorithms and using data structures that are presented in class.
The final part, which completes the implementation, uses all of the
advanced data structures presented during the semester (binary trees,
priority queues, hash tables, and graphs).
PROJECT PARTS
The individual parts include:
- Part 1: Parsing a webpage and building a
word frequency tree of its contents (uses binary search trees)
- Part 2: The first implementation of the
search engine: Processing User Queries to Find the Most Relevant web Pages
(uses priority queues and binary search trees)
- Part 3: Adding caching to the search engine and
implementing the GUI front-end to the web browser and search engine
(uses hash tables, priority queues, and binary search trees)
- Part 4: Adding a hyper-link graph to
the web browser that displays Shortest Path and reachability information
between webpage nodes. And incorporating "link-to" information into the
ordering of search results presented by the search engine.
(uses graphs, hash tables, priority queues, and binary search trees)
Extra Credit parts are included with this assignment:
- Adding Home, Back, and Forward buttons to the web browser (adds
stacks and/or queues)
- Enabling links in Displayed web pages and in the list of URL results
displayed by the search engine.
- Adding support for building the link-graph from any starting
webpage loaded from the Internet (vs. building just from local webpages).
The final version of the web browser has the following capabilities:
- It displays a web page given a URL.
- It displays connectivity information of local web pages.
- It answers questions about the connectivity of local web pages.
such as the shortest path distance (following links) between to pages,
and whether or not there is a link path from one page to another.
- Its search engine finds good matching web pages given a
query string and display the resulting URLs in order of best match to worst
(criteria used to determine a "good match" changes as more features are added
to the search engine).
- It automatically displays the webpage of the best matching
URL result of a search.
- Its search engine caches previous search results that are used
to improve the performance of subsequent searches.