Handy References:
Lab Audio, day 1
Lab Audio, day 2
Lab 2 Goals:
- Implement the server side of a HTTP / TCP connection.
- Use threading to serve multiple concurrent clients.
- More practice with sockets, send(), and receive().
Overview
Having built a Web client, for this lab we'll look at the other end of the
HTTP protocol -- the Web server. As real web clients (e.g., browsers like
Firefox) send requests to your server, you'll be finding the requested files
and serving them back to the clients.
Your server program will receive two arguments: 1) the port number it
should listen on for incoming connections, and 2) the directory out of which it
will serve files (typically called the document root). For example:
./lab2 8080 test_documents
This command will tell your Web server to listen for connections on port
8080 and serve files out of the test_documents directory. That is, the test_documents directory is
considered '/' when responding to requests. If you're asked for /index.html,
you should respond with the file that resides in test_documents/index.html. If you're
asked for /dir1/dir2/file.ext, you should respond with the file
test_documents/dir1/dir2/file.ext.
High-level checklist
Roughly, your server should follow this sequence:
- Read the arguments, bind to the specified port, and find your document root
(you might find the chdir()
system call helpful).
- Accept a connection, and hand it off to a new thread for concurrent
processing. (Initially, you can just call a function rather than spawning a
thread.)
- Receive and parse a request from the client.
- Look for the path that was requested, starting from your document root (the second argument to your program). One of four things should happen:
- If the path exists and it's a file, formulate a response (with the Content-Type header set) and send it back to the client.
- If the path exists and it's a directory that contains an index.html file, respond with that file.
- If the path exists and it's a directory that does NOT contain an index.html file, respond with a directory listing.
- If the path does not exist, respond with a 404 code with a basic HTML error page.
You might want to make each of these cases a separate function!
- Close the connection, and continue serving other clients.
Requirements
In addition to serving requested files, your server should handle at least
the following cases:
- A variety of file formats should all display properly, including both text and binary formats. You'll need to
return the proper HTTP Content-Type header
in your response. You don't need to handle everything on that
list, but you should at least be able to handle files with html, txt, jpeg,
gif, png, pdf, and ico extensions. You may assume that the file extension is
correct (e.g., I'm not going to name a PDF file with a ".txt" suffix). For .ico, use: image/x-icon
- If asked for a file that does not exist, you should respond with a 404
error code with a readable error page, just like a Web server would. It
doesn't need to be fancy, but it should contain some basic HTML so that the
browser renders something and makes the error clear.
- Some clients may be slow to complete a connection or send a request. Your
server should be able to serve multiple clients concurrently, not just
back-to-back. For this lab, use multithreading with pthreads to handle
concurrent connections. (We'll try an alternative to threads, event-based
concurrency, in a future lab assignment.)
- If the path requested by the client is a directory, you should handle the
request as if it was for the file "index.html" inside that directory if such a file exists.
Hint: use the stat()
system call to determine if a path is a directory or a file. The st_mode field
in the stat struct has what you need.
- The Web server should respond with a list of files when the user requests a
directory that does not contain an index.html file. You can read the contents
of a directory using the opendir() and readdir() calls. Together they
behave like an iterator. That is, you can open a (DIR *) with opendir and then
continue calling readdir(), which returns info for one file, on that (DIR *)
until it returns NULL. Note that there should be no additional files created
on the server's disk to respond to the request. The response should mimic
result of running:
python -m SimpleHTTPServer
When testing, you should be able to retrieve byte-for-byte copies of files
from your server. Use wget or curl to fetch files and
md5sum or diff to compare the fetched file with the original. I will do this when grading. For full credit, the files need to be exact replicas of the original.
Grading Rubric
This assignment is worth five points, which I'm planning to assign as follows:
- 1 point for serving both text and binary files (that can be rendered
correctly -- set Content-Type) to a standard Web browser (e.g., Firefox).
- 1 point for serving exact copies of text and binary files to command line
clients like wget or curl. The MD5 sums should match!
- ½ point for correctly returning a 404 error code and HTML message when a
request asks for a file that does not exist.
- ½ point for serving index.html, if it exists, when asked for a file that is
a directory.
- 1 point for handling multiple clients concurrently. Your server should be
able to have one or more open telnet connections, doing nothing, while still
being able to handle browser/wget/curl requests.
- 1 point for returning a file listing when asked for a directory that does
not contain an index.html file. The listing should use simple HTML to provide
clickable links.
Hints / Tips / Assumptions
- You may assume that file suffixes correctly correspond to their type (e.g.,
if a file ends in ".pdf" that it is a PDF file).
- You may assume that requests sent to your server are at most 4 KB in
length.
- You may assume that if the user requests a path that is a directory, the
path will end in a trailing '/'. When generating the list of files in a
directory, make sure your server also sends back URLs that end in '/' for
directories. This is for the benefit of your browser, which keeps track of its
current location based on the absence or presence of slashes.
- Unlike many of your prior experiences with threading (e.g., parallel GOL in
CS 31), the threads in this assignment don't need to coordinate their actions.
This makes the threading relatively easy, and it's something that can be added
on once the main serving functionality is implemented. When starting out,
organize your code such that it calls a function on any newly-accepted client
sockets, and let that function do all the work for that connection. This'll
make adding pthread support quite simple!
- Take compiler warnings seriously. Unless it's an unused variable, you
should address the warning as soon as you see it. Dealing with a pile of
warnings just makes things more difficult later.
- Test your code in small increments. It's much easier to localize a bug
when you've only changed a few lines.
- If you need to copy a specific number of bytes from one buffer to another,
and you're not 100% sure that the data will be entirely text, use
memcpy() rather than strncpy(). The latter still terminates
early if it finds a null terminator ('\0').
- If you're trying to do some sort of specific string or memory manipulation,
feel free to ask if there's a better/recommended way to do it rather than brute
force. Often there may be a standard library function that will make things
easier.
Reminder
Always, always, always check the return value of any system calls you
make! This is especially important for send, recv, read, and write calls
that tell you how many bytes were read or written.
If you have any questions about the lab requirements or specification,
please post on Piazza.
Submitting
Please remove any debugging output prior to submitting.
To submit your code, simply commit your changes locally using git
add and git commit. Then run git push while in your lab
directory. Only one partner needs to run the final push, but make sure both
partners have pulled and merged each others changes. See the section on Using a shared repo on the git help
page.