Lab 3 Partners
|
Luis Ramirez and Nick Felt |
Elliot Weiser and Steven Hwang |
Jordan Singleton and Phil Koonce |
Ames Bielenberg and Niels Verosky |
Kyle Erf and Sam White |
Choloe Stevens and Katherine Bertaut |
See the git howto for information about how you can set up a git
repository for your lab 3 project.
|
Project Introduction
For this assignment you and your partner will implement a web server.
This lab is designed to give you some practice writing client-server
socket programs, writing a multi-threaded server, using signals,
and learning about the HTTP protocol.
This is a larger and more involved programming assignment than the first two
labs. I strongly encourage you to get started on it right away.
There is a lot of information about getting started and about helpful
resources on this page (including information about where to get starting
point code and sample code). Read through this entire page before you
get started, and refer back to it as you go...if you have a question about
how to do something, there may be an answer or hint here.
Contents:
Project Requirements
Project Details
Getting Started
Useful Functions and Links to more Resources
Submission and Demo
Project Requirements
- Your web server should be written in C or C++.
- Use port 8888 instead of port 80.
- You will implement a multi-threaded web server, one thread
per client connection. This will allow your web server to
simultaneously handle requests from multiple clients.
- Your server should implement parts of the HTTP 1.1 protocol,
which maintains an open socket connection to the client
after the response is sent (HTTP 1.0 closes the connection as
soon as the response is sent to the client). HTTP 1.1
makes subsequent communication with the client faster by not having
to repeat the TCP connection protocol. However, your server must
prevent too many simultaneous connections: if the total number of
simultaneous open connections
gets above some max_connections threshold value (pick something
small to test like 5), your server will start closing
the oldest open connections and their associated server threads will die.
The killed server thread should clean up any shared state and close its
end of the socket before dying.
Remember that connections can be closed other ways too. For example, the
client-side can close the connection. In this case the associated
server thread should detect that the socket was closed, clean up
any shared state, and exit.
- Your server must handle GET and HEAD client requests.
It does not need to handle POST nor any other requests.
- It should return appropriate status codes, including
200, 400, 403, and 404. If the server returns an error code to
a client, it should also return headers and a message body with
a simple error page. For example:
"<html><body>Not Found</body></html>"
If you'd like, you can include a link to the
HTTP Status Cat jpg corresponding to the status code in your
response. Add in something like this to the body of the response
above:
<img src="http://httpcats.herokuapp.com/400">
You can view all the status cat images
here
- It should support the headers Content-Length, Content-Type, and Date.
- It does not need to handle any php or javascript parsing. If
the client requests a .php file, just send the file contents back
just like an .html or .jpg file.
- It should handle urls that start with / and that start with /~username.
The web server's starting pages are in /scratch/cs87/cs/. urls that
start with /, such as /people come from files in /scratch/cs87/cs/people/,
url requests for /~username/, come from files in /home/username/public_html/.
- Your web server should be free of memory access errors (i.e have no
valgrind errors), and it should be well designed and well commented.
Note if you start your web server on one of
our lab machines, you can only connect to it with clients that are also
running on our lab machines.
Project Details
Web server
The basic design of your web server is the following:
- create a listen socket on port 8888
- enter an infinite loop:
- accept the next connection
- if there are already max connections, kill the oldest
thread by sending it a SIGALRM signal.
- create a new thread to handle the new client's connection,
passing it the socket returned by accept.
- the worker thread main function should be an infinite loop that
only exits if there is an error condition returned by a system call,
or if the thread receives a SIGALRM from the main thread and
kills itself. Otherwise, the worker threads continue to handle HTTP
requests from the client.
Before a thread dies, it should close its end of the socket and clean
up any other global state necessary for correct functioning of your
web server.
The main server thread should be in an infinite loop, waiting to
accept the next client connection. It should exit only when it gets appropriate
error return values from accept, send, recv, read, write, ...
Signals and Sockets and Threads
Threads share the same address space so they can coordinate using
shared memory, and synchronize using locks, barriers, or semaphores.
Threads also share the same copy of open files and the signal table
associated with the process in which they are contained. This means that
if one thread opens a file, all threads can read or write to it using
the file descriptor returned by open. Similarly, if one thread closes a file, it
is closed for all threads in the process.
In Unix, sockets have a file interface and threads can close sockets just
like they would close a file by calling close:
int fd = socket();
...
close(fd);
Your web server will use signals as a way to notify a worker thread that it
should die when there are too many open connections. A signal is a
software interrupt, that can by synchronous or asynchronous. One process or
thread can send (or post) a signal to another one, and when the other one
receives the signal it stops doing what it is currently doing and runs
a special signal handler function. Processes (and threads) can block
some signals, register their own handler functions on some signals, or
just use the operating system's signal handler functions
(this is the default). For example, when you type CNTL-C in the terminal
that is running a program, the running process is sent a SIGKILL signal
telling it to die. SIGKILL is an example of a non-blockable signal, meaning
that a process cannot choose to ignore a SIGKILL...it must die.
Your web server's main thread will send a worker thread a SIGALRM signal
when it wants the worker thread to exit (and close its connection to
the client). To do this, do the following:
- The main thread will register a signal handler function on
the SIGALRM signal before entering its main loop (this sets up the
signal handler on SIGALRM for all threads):
struct sigaction sa;
// set all field values in sa to zero using memset:
memset((void *)(&sa), 0, sizeof(sigacts));
sigemptyset(&sa.sa_mask);
// name of my signal handler function:
sa.sa_handler = my_sigalrm_handler;
sa.sa_flags = 0;
// register my signal handler with the SIGALRM signal:
val = sigaction(SIGALRM, &sa, NULL);
- When the main listener thread receives a new connection, it
will check to see if there are already a maximum number of connections,
and if so, it will send the oldest thread a SIGALRM signal by calling:
pthread_signal(workers_pthread_tid, SIGALRM);
- The signaled worker thread will call the handler function registered
on SIGALRM:
void my_sigalrm_handler(int s) {
// clean up any shared state associated with me
// close my socket
// and call pthread_exit to die
}
You will have to determine how a signaled thread knows which socket is its
own to close.
Threads will also need to detect and handle other cases when they should
exit, and clean up any global state associated with them, including
closing their socket before exiting. One place where this may occur
is if the client side disconnects and closes its end of the socket.
HTTP 1.1 and multiple simultaneous connections
You should use the pthread library to spawn a new server thread each time
a client connects to your server. The server thread has a dedicated
connection to this client and will keep this connection open and continue
to handle GET and HEAD requests from the client. Your main server thread
should return back to its accept loop after spawning the server thread
so that it can handle a connection from another client.
This way your server can simultaneously handle requests from different
clients. Test that this works by connecting to your server from
different clients simultaneously and sending multiple requests from these
clients.
Remember to link in the pthreads library to compile a pthreads program
If you are using the Makefile from my client server example code, it
is already included here:
LIBS = $(LIBDIRS) -pthread
If you aren't using my Makefile, include -pthread at the end of the
gcc or g++ command line in your makefile.
The main listener thread should repeat its main loop after spawning a new
worker thread (and perhaps killing an old one), and call accept on the
listener socket to wait for another client connection.
If your solution requires any use of shared state among threads, make
sure to use a pthread synchronization primitive (likely a pthread_mutex_t)
to synchronize the accesses to this shared state. Also, think about
scope very carefully: threads can only share memory associated with global
variables or that is on the heap. Technically, a thread can share state
on another thread's stack too (if they have a pointer to it) but I
strongly suggest not doing this because the state can be overwritten
and modified by the other thread's execution.
Web clients
You can use multiple programs to connect to your web server and send it
HTTP commands:
- telnet server_IP port_num, then type in a GET command
(make sure to enter a blank line after the GET command). For example:
$ telnet 130.58.68.62 8888
GET /index.html HTTP/1.0
telnet will exit when it detects that your web server has closed its end
of the socket (or you can kill it with CNTL^C, or if that doesn't work
use kill or pkill: pkill telnet). Use ifconfig to get a machine's
IP address (described in Useful Utilities section).
- firefox: Enter the url of the desired page specifying your web server
using its IP:port_num (e.g. http://130.58.68.62:8888/index.php)
You can also just use localhost or the host name on our system:
localhost:8888/index.php
tomato:8888/~cfk/
- wget: wget -v 130.58.68.62:8888/index.html
wget copies the html file returned by your web server into a file with
a matching name (index.html) in the directory from which you call wget.
- modify the example client program to send http requests to your server.
I don't think this is necessary (since the other three clients are
already written for you), but you could modify the web_client program given
with the starting point code to send GET requests to your
web server and receive the responses.
HTTP
Start by reading
HTTP Made Really Easy by
Jim Marshall.
It is very important that you can interpret the format of a client request
correctly, and that you send correctly formated responses to clients. Many
parts of a correctly formatted message involve sequences of carriage return
and newline characters ("\r\n"). These are used to signify the end of
all or part of a "message". Here is the general format of
a server request:
initial line
Header1: value1
Header2: value2
Header3: value3
(optional message body goes here)
For example, a GET response for a very simple page may look like:
HTTP/1.1 200 OK
Date: Sun, 10 Jan 2010 18:17:43 GMT
Content-Type: text/html
Content-Length: 53
<html>
<body>
<h1>CS 87 Test Page</h1>
</body></html>
It is very important that each header line ends with a "\r\n"
and that there is a blank line (another "\r\n") between the headers
and the message body. The message body, however is sent without
a trailing "\r\n". Instead the header Content-Length is used to tell
the client the size of the message body.
GET requests and mapping urls to files
There is one format of url that you do not need to handle for this
assignment. These are ones where the server would respond with a
"301 Moved Permanently" response vs. responding with OK and the file
contents. This case is described below in more detail.
Directory names in urls correspond to files named either index.html or
index.php in the named directory. Your web server should first look for
a file named index.html and if that doesn't exist look for index.php
when handling these requests.
Here are some example GET requests that you need to handle,
and their corresponding file name(s):
GET / HTTP/1.1 /scratch/cs87/cs/index.html
or /scratch/cs87/cs/index.php
GET /index.html HTTP/1.1 /scratch/cs87/cs/index.html
GET /index.php HTTP/1.1 /scratch/cs87/cs/index.php
GET /search.html HTTP/1.1 /scratch/cs87/cs/search.html
GET /courses/ HTTP/1.1 /scratch/cs87/cs/courses/index.html
/scratch/cs87/cs/courses/index.php
GET /~newhall/ HTTP/1.1 /home/newhall/public_html/index.html
/home/newhall/public_html/index.php
GET /~newhall/newcluster.jpg HTTP/1.1 /home/newhall/public_html/newcluster.jpg
You do not need to correctly handle GET requests of the following format
(i.e. GET requests with no trailing '/' when the last name corresponds
to a directory):
GET /~newhall HTTP/1.1
GET /courses HTTP/1.1
The way a web server would handle requests like this
is to send a "301 Moved Permanently" response to the client
with the real url of the page ("Location: http://IP:portnum/~newhall/").
The client would resend the GET request using the url returned by the server:
GET /~newhall/ HTTP/1.1
When your web server receives a request of this form, you can choose
to either have it respond with an error response or with OK. If your
web server sends an OK response, then the client may make subsequent
GET requests for any files included in the page, and these GET requests
will not have the correct url (the client doesn't know that newhall
is a directory and instead of requesting /~newhall/foo.jpg will request
/foo.jpg, if my homepage includes the foo.jp file). Just handle these as
you would any bad url (there is no file associated with /foo.jpg).
You do, however, need to correctly handle GET requests with the trailing
'/' (e.g. /~newhall/).
You are welcome to add support for 301 responses if you'd like, but
you are not required to do so for this assignment, so I'd suggest
only adding this after the rest of your web server works.
Getting Started
You can grab a copy of my starting point files for client
and server TCP/IP socket programs in C.
They are in ~newhall/public/cs87/socket_startingpt/. The starting point contains
a sample Makefile for building a web_client and web_server executables,
and the very beginnings of both implementations (mostly just #includes
for the server).
In addition, I have a example program for sending and handling signals in pthread
programs. It is available here: ~newhall/public/cs87/pthreads_signals_example/
I strongly encourage you to implement and test incrementally.
Also, it is very important to check return values from all functions
and to handle error return values correctly. For example, if
a call to read on a socket returns before the requested number of bytes
have been read, this could mean that the other end of the socket was
closed. When this is the case, you want to stop continuing to
try to read from this socket (an infinite loop).
Here is one suggestion for proceeding:
- Starting with the starting point code, finish a simple client and server
program where the client connects to the server and sends it a simple message
and waits for a response. The server should receive the message, print
it out, and close the socket. The client should exit when it detects the
server has closed its end of the socket.
- See if you can connect to your web server from wget, firefox and telnet
and send it an http request (in the correct format). Your server could
just spawn a worker thread whose main function just prints out
the message, closes the socket and calls pthread_exit (no infinite worker
thread loop, and no response sent to the client).
- Next, modify your server to send a fake response to a client GET request
(don't really parse the requested page and fetch the corresponding file, but
send a 200 response with a very short web page message body. If all goes well,
firefox should display your bogus web page after receiving your response.
If things don't go well, connect to your server using telnet as you
can more easily see what the client is receiving from your server.
- Next, add support for finding the correct web page to return for a
GET response. Add support for handling different errors (file not found,
etc.).
- Next, add in full support multiple pthread worker threads that
keep the connection open until they are killed or detect an error and
kill themselves. Add support for the main thread
killing the oldest connections when max connections are reached and
a new connection comes in.
- Make sure your program is free of valgrind errors (it would not
hurt to run in it on valgrind as you develop different parts too).
- Remove (or comment out) any debug output before submitting
your solution.
Your program should use good modular design, be well-commented, robust,
and correct. See my C Style Guide off my C resource page.
Useful Functions and Resources
- HTTP made real easy. by Jim Marshall
- HTTP 1.0 Specification
- HTTP 1.1 Specification
- Socket Programming Links.
Beej's Guide is a good staring point and has code examples (sections 5 and 6 are particularly useful).
You can use either read and write or send and recv to send and receive messages on sockets.
We also have a copy of Steven's "Unix Network Programming" in the main lab. Chapter 5 is likely
the most useful.
- C and C++ programming and debugging. Includes some documentation on C string
library functions and file I/O.
- A few string functions that may be particularly useful for
this assignment:
strtok: string tokenizer, multiple calls to it on same string return a pointer
to the next token in the string:
#include <string.h>
char *next =0;
char *s = "hello there how are you?"
char *delim = " \t\r\n"; // delimiters are space, tab, cr, eoln
next = strtok(s, delim); // first call pass string to tokenize
while (next != 0) {
printf("%s\n", next);
next = strtok(0, delim); // subsequent calls pass 0 to get the
// next token in the string s
}
sprintf: like printf, but instead of writing out the resulting string, it
is copied to the dest string: sprintf(dest, format_string, args ...)
ex:
#include <stdio.h>
char result[1024];
sprintf(result, "%s%d: %4.2f\n", "hello there", 34, 6.55);
// result string will have value: "hellothere34: 6.55\0"
printf("%s", result);
- ifconfig, dig, nslookup: to get a machine's IP address:
$ ifconfig # (/sbin/ifconfig) on machine on which you want the IP
eth0 Link encap:Ethernet HWaddr
inet addr:130.58.68.62 Bcast:130.58.68.255 Mask:255.255.255.0
... ^^^^^^^^^^^^
IP address
$ dig tomato.cs.swarthmore.edu
$ nslookup tomato.cs.swarthmore.edu
- setsockopt: I recommend setting the main server's socket options
SO_LINGER to off (0). SO_LINGER is on by default, meaning that if there
are data to send in the socket's buffer when the socket is closed,
the close is delayed for some time to wait for all the data to be sent.
For the listener socket, this can mean that if you kill your server program,
you cannot restart it for about 1 minute because the TCP socket bound to
port 8888 from the previous run of your server, often is still lingering
around, and bind will fail. There is an example
call to setsockopt in the web_client.c starting point code.
- access: check to see if a file is accessible in some way:
access(path_name_of_file, X_OK | F_OK);
access(path_name_of_file, R_OK | F_OK);
- stat: get statistics about a file, including its size in bytes,
modification time, ...
struct stat stat_info;
ret = stat("/home/newhall/foo.txt", &stat_info);
- time, gmtime_r, ctime_r functions to get current time (in GMT
time zone) and convert it to a string representation. time returns the Unix
time, which is the number of seconds since Jan 1, 1970. It takes a
time_t argument that it sets to this value. gmtime_r converts it to GMT.
mktime converts the time rep returned by gmtime_r to a time_t value.
ctime takes a time_t value and creates a string representation
of the time. You should use the reentrant versions of functions.
Here is an example (note: there is missing error detection
and handling in this example):
char buff[64];
time_t mytime, mytime2;
struct tm my_time_struct;
time(&mytime);
gmtime_r(&mytime, &my_time_struct);
mytime2 = mktime(&my_time_struct);
ctime_r(&mytime2, buff);
printf("Date: %s\n", buff);
In the initial stages of implementing your solution, you can just return
a bogus time string for the Date header.
- pthread mutex variables:
// declare and initialize:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
// use:
pthread_mutex_lock(&mutex);
// critical section code
pthread_mutex_unlock(&mutex);
- htons, htonl, ntohl, ...: functions for converting between host and network byte order.
- man and apropos:
documentation for system calls (in section 2) and C and pthread library calls (in section 3).
$ man 2 read
$ man 2 send
$ man 3 strcpy
$ man 2 stat
$ man pthread_create
- /proc file system: access to all kinds of system information.
To watch TCP sockets being created, run (this can help verify that
you are supporting multiple client connections):
watch -n 1 cat /proc/net/tcp
- netstat -ant lists all information about all in-use sockets on a machines.
If you want to continually watch this, run: watch -n 2 netstat -ant.
Submission and Demo
Create a
tar file containing:
- All your web server source files, and makefile to build server (and client
if applicable).
- A README file with: (1) you and your partner's names; (2) an example
of how to run your web server (a command line); and (3) a description
of any features you have not fully supported and/or any errors you
were unable to fix.
I'd suggest creating a handin directory and copying all these things into
it. It is good to check that you have your full solution in the
handin directory (type 'make' to check that everything builds, try running
it, then type 'make clean' to remove executables and .o's from what you
submit). Then tar up your handin directory.
One of you or your partner should submit your tar file by running cs87handin.
Demo
You and your partner will sign up for a 15 minute demo slot to demo your
web server. Think about, and practice, different scenarios to demonstrate
both correctness and good error handling.
You will want to demonstrate
concurrent client connections, persistent connections, what happens when
the client side closes it end of the socket (maybe via killing the client),
and show that older server connection are closed when the max number
of connections has been reached.