CS 43 — Lab 2: A Concurrent Web Server
Due: Thursday, February 24 @ 11:59 PM
1. Overview
Having built a web client, for this lab we’ll look at the other end of the HTTP protocol — the web server. As real web clients (e.g., browsers like Firefox) send requests to your server, you’ll be finding the requested files and serving them back to the clients.
1.1. Goals
-
Implement the server side of (non-persistent) HTTP over a TCP connection.
-
Apply socket system calls (
bind
,listen
, andaccept
) on a server process to interact clients. -
Use threading to serve multiple concurrent clients.
-
More practice with sockets,
send
, andrecv
.
1.2. Handy References:
-
RFC 1945: HTTP 1.0 Specification. Sections 4, 5, and 6 are probably the most helpful.
-
Manual pages for
pthread_create
, andpthread_detach
.
2. Requirements
Your server program, lab2
, will receive two arguments:
-
the port number it should listen on for incoming connections, and
-
the directory out of which it will serve files (typically called the document root).
For example:
./lab2 8080 test_documents
This command will tell your web server to listen for connections on port 8080
and serve files out of the test_documents
directory. That is, the
test_documents
directory is considered /
when responding to requests. If
you’re asked for /index.html
, you should respond with the file that resides
in test_documents/index.html
. If you’re asked for /dir1/dir2/file.ext
, you
should respond with the file test_documents/dir1/dir2/file.ext
.
On most UNIX systems, only users with administrative (root) privileges are allowed to bind to ports below 1024. Users without such privileges often test web services on ports 8080 or 8000 because they sound "close" to port 80. When connecting your web browser to your lab2 server, you’ll need to explicitly
specify the port number in the URL with a colon ( http://localhost:8080/index.html or equivalently, http://127.0.0.1:8080/index.html |
You may find the chdir
system call helpful
when dealing with file paths. It will change your process’s "working
directory", and making your working directory the document root will help in
locating files within it.
2.1. Workflow of Your Program
Roughly, your server should follow this sequence:
-
Read the arguments, find your document root, bind to the specified port, and begin listening for incoming connections.
-
Accept a connection, and:
-
week 1: hand the socket off to a function that handles the remaining steps.
-
week 2: pass the socket to a new thread for concurrent processing.
-
-
Receive and parse a request from the client.
-
Look for the path that was requested, starting from your document root (the second argument to your program). One of four things should happen: You might want to make each of these cases a separate function!
-
If the path exists and it’s a regular file, formulate a response (with the
Content-Type
header set) and send it back to the client. -
If the path exists and it’s a directory that contains an
index.html
file, respond with that file. -
week 2: If the path exists and it’s a directory that does NOT contain an
index.html
file, respond with a directory listing. -
If the path does not exist, respond with a
404
code with a basic HTML error page. The 404 HTML page can be static and very simple — it just needs to be enough for a user to see a 404 message in a real browser.
-
-
Close the connection and continue serving other clients.
2.2. Server Behavior Expectations
For full credit:
-
Your server should send byte-for-byte identical copies of files to clients. Use
wget
orcurl
to fetch files andmd5sum
ordiff
to compare the fetched file with the original. I will do this when grading! -
A variety of file formats should display properly in a real web browser (e.g.,
firefox
), including both text and binary formats. You’ll need to return the proper HTTPContent-Type
header in your response. You don’t need to handle everything on that list, but you should at least be able to handle files with.html
,.txt
,.jpeg
,.jpg
,.gif
,.png
,.pdf
, and.ico
extensions. You may assume that the file extension is correct (e.g., I’m not going to name a PDF file with a.txt
suffix). -
If asked for a file that does not exist, you should respond with a 404 error code with a readable error page, just like a web server would. It doesn’t need to be fancy, but it should contain some basic HTML so that the browser renders something and makes the error clear.
-
Some clients may be slow to complete a connection or send a request. Your server should be able to serve multiple clients concurrently, not just back-to-back. For this lab, use multithreading with pthreads to handle concurrent connections. (We’ll try an alternative to threads, event-based concurrency, in a future lab assignment.)
-
If the path requested by the client is a directory, you should handle the request as if it was for the file
index.html
inside that directory, if such a file exists. Hint: use thestat
system call to determine if a path is a directory or a file. Using theS_ISDIR
macro on thest_mode
field of the stat struct will help you to identify directories. -
The web server should respond with a list of files when the user requests a directory that does not contain an
index.html
file. You can read the contents of a directory using theopendir
andreaddir
calls. Together they behave like an iterator. That is, you can open aDIR *
withopendir
and then continue callingreaddir
, which returns info for one file, on thatDIR *
until it returnsNULL
. Note that there should be no additional files created on the server’s disk to respond to the request. The response should mimic result of running:python -m SimpleHTTPServer
-
Your program should generate no warnings from
valgrind
. Ifvalgrind
ever tells you something is wrong DON’T IGNORE IT! Fix it before moving on.
2.3. Assumptions
-
You may assume that file suffixes correctly correspond to their type (e.g., if a file ends in ".pdf" that it really is a PDF file).
-
You may assume that requests sent to your server are at most 4 KB in length.
-
You may assume that if the user requests a path that is a directory, the path will end in a trailing
/
. When generating the list of files in a directory, make sure your server also sends back URLs that end in/
for directories. This is for the benefit of your browser, which keeps track of its current location based on the absence or presence of slashes. -
You may assume that you will only receive GET requests from clients.
-
If you receive an
HTTP/1.1
request, you should respond back with anHTTP/1.0
response.
You should NOT assume anything about the size of the file that a client requests. Rather than trying to read the entire file into memory at once, you can read a chunk of the file (e.g., 4096 bytes) and then send just that chunk (in loop!) before reading the next chunk. |
2.4. Checkpoint
To be on track, by the start of the next lab session, you should have finished:
-
Your server can accept client connections and hand them to a function for further processing.
-
Your processing function can:
-
receive a full request from the client, using the presence of a double CRLF to determine that it has received the full request.
-
parse the request and extract the requested path.
-
generate a response (both header and body) for requested regular files and directories that contain an
index.html
file.
-
A good stretch goal is:
-
Sending back a simple, static HTML document for 404 errors (requested file not found).
It’s fine to defer until next week:
-
Handling multiple clients concurrently, with threading.
-
Producing directory listings for directories that do not contain an
index.html
file.
3. Examples and Testing
You should test your server in two ways:
-
Using a real web browser like
firefox
, request files and ensure that they render properly. Note: browsers are very forgiving in what they receive and will do their best to render properly, even when they aren’t given correct data. -
To verify correctness, you should use a tool like
wget
to request and save copies of files from your server. You can then use the tools likediff
andmd5sum
that we used to verify correctness in lab 1.
4. Tips & FAQ
-
Use HTTP version 1.0 — version 1.1 can get a lot more complicated. The subset of the HTTP 1.0 protocol you’ll need to implement for this assignment is quite small, but you may find the full protocol specification to be helpful.
-
All HTTP headers are ASCII string characters, so you can use the
str
family of functions to manipulate them safely. -
Always, always, always check the return value of any system calls you make!
4.1. File types
When setting the Content-Type
header, use the following file suffix to
content type mappings:
-
html
:text/html
-
txt
:text/plain
-
jpeg
:image/jpeg
-
jpg
:image/jpg
-
gif
:image/gif
-
png
:image/png
-
pdf
:application/pdf
-
ico
:image/x-icon
It’s fine to hard-code knowledge of these specific types into your server.
4.2. File paths
-
chdir
: use this function to change your server process’s "current working directory" totest_documents
. You probably want to do this at the very beginning of your program so that all paths can be relative to the document root. -
stat
: use this system call to determine if a path is a directory or a file. Allocate a variable of typestruct stat
and pass the address of the struct tostat
(along with the path string). On success, thestat
call will fill in the struct, and you can access the fields:-
Use the macro
S_ISDIR()
and pass in thest_mode
field of yourstruct stat
variable.S_ISDIR()
will return true (non-zero) if the path is a directory or false (zero) otherwise. -
You don’t need to worry about all the other fields of
struct stat
includingS_ISCHR
,S_ISBLK
, etc.)
-
4.3. String Parsing and File I/O
-
Many of the tools you used in lab 1 for manipulating strings will also be helpful in lab 2.
-
If you need to copy a specific number of bytes from one buffer to another, and you’re not 100% sure that the data will be entirely text, use
memcpy
rather thanstrncpy
. The latter terminates early if it finds a null terminator (\0
), whereasmemcpy
will always copy the requested number of bytes. -
Similar to lab 1, you will likely find
fopen
to be helpful for opening files. This time, use a mode of"r"
, since you’ll only be reading files. Afterward, you can read the contents withfread
. Don’t forget tofclose
when done.
5. Other Reference Material
5.1. Socket Programming
The server side of socket programming has a few more system calls than a
client. Use man bind
, man listen
, and man accept
to read through each of
these functions. Look through your starter code on github, and follow along
with the description of each of the system calls.
-
socket()
: Like the client side, first create a socket. This time, we name itserver_sock
since it’s going to serve a special purpose. Useserver_sock
only to accept new connections. Never useserver_sock
with calls tosend
orrecv
. -
setsockopt
: The default behavior of TCP (implemented by the OS) is that if you bind to a port and terminate your program, the OS makes you wait for a minute before anyone else can bind to that port again. Setting theSO_REUSEADDR
socket options disables the waiting, which makes rapid debugging easier. -
bind()
: Associate a socket with the IP address and port on which it should listen for incoming connections. A machine can have more than one network interface or IP address, usually if it connects to two different networks. Assign theINADDR_ANY
macro to thesockaddr
's address to serve content on all the server’s IP interfaces. -
listen()
: After binding to an address and port, uselisten
to begin allowing client connections. This function essentially opens the socket for business. Thebacklog
parameter defines how many clients are allowed to wait in a queue for your server to accept them. -
while(1)
: A server is always on: enter an infinite loop, where the main body of the work is going to happen. We declare a secondsock
integer that will eventually represent a new client connection. -
accept()
: finally, callaccept
to connect to a new client. You pass the server socket as a parameter to accept. On success, it returns a new socket that represents your connection to the new client. Use that newly returned socket to communicate with the client viasend
andrecv
.
5.2. Threading
Some clients may be slow to complete a connection or send a request. To prevent
all other clients waiting on one slow client, your server should be able to
serve multiple clients concurrently, not just back-to-back. For this part of
the lab, we’ll use multithreading with pthreads
to handle concurrent
connections.
-
Use pthread_create and pthread_detach after calling
accept
for each new client. -
Unlike many of your prior experiences with threading (e.g., parallel GOL in CS 31), the threads in this assignment don’t need to coordinate their actions. This makes the threading relatively easy, and it’s something that can be added on after the main serving functionality is implemented. When starting out, organize your code such that it calls a function on any newly-accepted client sockets, and let that function do all the work for that connection. This will make adding
pthread
support quite simple! -
In your starter code you should see a
thread_detach_example.c
. This is very similar to what you will be implementing. This function takes the number of threads as an input argument, and then it creates and detaches each thread. Each thread independently runsthread_function
. The example passes one argument to each thread, an integer pointer. In your server, this will be the socket descriptor (integer) for a newly-accepted client.-
Inside of the
thread_function
, you just have to cast the input back from a genericvoid *
unknown type pointer to be an integer pointer. Then, you can dereference that pointer to get the value, after which you can free it. This is the main complexity in this part of the lab — wrangling pointers!
-
-
Finally, we have a call to the
pthread_detach
function. This basically says I am creating a thread, it is going to go do something in the background, and I don’t need the thread to return a result — just exit once its done executing. Therefore the return value of ourthread_function
isNULL
to satisfy avoid *
return value. By detaching a thread, we are telling the OS to just clean it up once its done executing ourthread_function
, without the need for callingpthread_join
.
5.3. Providing a directory listing
-
Your web server should respond with a list of files when the user requests a directory that does not contain an index.html file.
-
Similar to opening a file with
fopen
and reading from a file withfread
, you can read the contents of a directory using theopendir
,readdir
andclosedir
calls. -
That is, if you have a valid directory path, you can pass it to
opendir
and store the result in a(DIR
*)
pointer. Just like a file pointer, every time you open a directory, you should close the directory withclosedir
. -
Next, you can keep calling
readdir
, which returns info for one file, on that(DIR
*)
pointer until it returnsNULL
. Seeman readdir
for details. DO NOT attempt to free thestruct dirent
pointer thatreaddir
returns — the man page makes it very clear that you should not attempt to free that pointer! -
You can follow the following
html
format to create your directory listing (substitute/path
with the actual path):<html> Directory listing for: /path/ <br/> <ul> <li><a href="your_dir_listing_with_slash/">"dir_name"</a></li> .... </ul> </html>
-
6. Submitting
Please remove any excessive debugging output prior to submitting.
To submit your code, commit your changes locally using git add
and git
commit
. Then run git push
while in your lab directory.