For this assignment you and your partner will
implement a webserver. This lab is designed to give
you some practice writing client-server socket programs.
For part 1 you will write a single-threaded webserver that
will handle one client request at a time. In the next part
you will add features so that your webserver can simultaneously
handle multiple client requests.
Contents:
Project Requirements
Project Details
Getting Started
Useful Functions and Links to more Resources
Submission and Demo
Project Requirements
- Your webserver should be written in C or C++.
- Use port 8888 instead of port 80.
- Your webserver should implement the HTTP/1.0 protocol which
closes connections to clients after the response has been
sent. You do not need to handle HTTP/1.1, which keeps connections
open for some time. If you receive an HTTP/1.1 request, just
respond with an HTTP/1.0 response (1.1 clients can handle 1.0
responses.)
- It should handle GET and HEAD client requests.
It does not need to handle POST nor any other requests.
- It should return appropriate status codes. You should
support 200, 400, 403, and 404. If you return an error code you
should also return headers and a message body with a simple error
page. For example:
"<html><body>Not Found</body></html>"
- It should support the headers Content-Length, Content-Type, and Date.
- It does not need to handle any php or javascript parsing. If
the client requests a .php file, just send the file contents back
just like an .html or .jpg file.
- It should handle urls that start with / and that start with /~username.
The web server's starting pages are in /scratch/cs87webpages/.
url requests for /~username/, come from /home/username/public_html/.
- Your webserver should be free of memory access errors (i.e have no
valgrind errors.)
Note that if you start your webserver on one of
our lab machines, you can only
connect to it with clients that are also running on our lab machines.
Project Details
Web server
The basic design of your webserver is the following:
- create a listen socket on port 8888
- enter an infinite loop:
- accept the next connection
- call a handler function, passing it the socket returned by accept,
that will handle the client's request.
- after the client's request is handled, close the socket and
return to the main loop.
- it should exit its infinite loop and quit when it gets
appropriate error return values from accept, send, recv, read, write, ...
Web clients
You can use multiple programs to connect to your webserver and send it
commands:
- telnet server_IP port_num, then type in a GET command
(make sure to enter a blank line after the GET command). For example:
$ telnet 130.58.68.62 8888
GET /index.html HTTP/1.0
telnet will exit when it detects that your webserver has closed its end
of the socket
- firefox: Enter the url of the desired page specifying your web server
using its IP:port_num (e.g. http://130.58.68.62:8888/index.html)
- wget: wget -v 130.58.68.62:8888/index.html
- modify the example client program to send http requests to your server.
I don't think this is necessary (since the other three clients are
already written for you), but you could modify the web_client program given
with the starting point code, to send GET requests to your
webserver and receive the responses.
HTTP
Start by reading
HTTP Made Really Easy by
Jim Marshall.
It is very important that you can interpret the format of a client request
correctly, and that you send correctly formated responses to clients. Many
parts of a correctly formatted message involve sequences of carriage return
and newline characters ("\r\n"). These are used to signify the end of
all or part of a "message". Here is the general format of
a server request:
initial line
Header1: value1
Header2: value2
Header3: value3
(optional message body goes here)
For example, a GET response for a very simple page may look like:
HTTP/1.1 200 OK
Date: Sun, 10 Jan 2010 18:17:43 GMT
Content-Type: text/html
Content-Length: 53
<html>
<body>
<h1>CS 87 Test Page</h1>
</body></html>
It is very important that each header line ends with a "\r\n"
and that there is a blank line (another "\r\n") between the headers
and the message body. The message body, however is sent without
a trailing "\r\n". Instead the header Content-Length is used to tell
the client the size of the message body.
GET requests and mapping urls to files
There is one format of url that you do not need to handle for this
assignment. These are ones where the server would respond with a
"301 Moved Permanently" response vs. responding with OK and the file
contents. This case is described below in more detail.
Here are some example GET requests that you need to handle,
and their corresponding file name(s) ("directory names" correspond to files
named either index.html or index.php in the named directory):
GET / HTTP/1.0 /scratch/cs87webpages/index.html
/scratch/cs87webpages/index.php
GET /index.html HTTP/1.0 /scratch/cs87webpages/index.html
GET /index.php HTTP/1.0 /scratch/cs87webpages/index.php
GET /search.html HTTP/1.0 /scratch/cs87webpages/search.html
GET /courses/ HTTP/1.0 /scratch/cs87webpages/courses/index.html
/scratch/cs87webpages/courses/index.php
GET /~newhall/ HTTP/1.0 /home/newhall/public_html/index.html
/home/newhall/public_html/index.php
GET /~newhall/newcluster.jpg HTTP/1.0 /home/newhall/public_html/newcluster.jpg
You do not need to correctly handle GET requests of the following format
(i.e. GET requests with no trailing '/' when the last name corresponds
to a directory):
GET /~newhall HTTP/1.1
GET /courses HTTP/1.1
The way a web server would handle requests like this
is to send a "301 Moved Permanently" response to the client
with the real url of the page ("Location: http://IP:portnum/~newhall/").
The client would resend the GET request using the url returned by the server:
GET /~newhall/ HTTP/1.1
When your webserver receives a request of this form, you can choose
to either have it respond with an error response or with OK. If your
webserver sends an OK response, then the client may make subsequent
GET requests for any files included in the page, and these GET requests
will not have the correct url (the client doesn't know that newhall
is a directory and instead of requesting /~newhall/foo.jpg will request
/foo.jpg, if my homepage includes the foo.jp file). Just handle these as
you would any bad url (there is no file associated with /foo.jpg).
You do, however, need to correctly handle GET requests with the trailing
'/' (e.g. /~newhall/).
You are welcome to add support for 301 responses if you'd like, but
you are not required to do so for this assignment, so I'd suggest
only adding this after the rest of your webserver works.
Getting Started
You can grab a copy of my starting point files for this lab.
They are in ~newhall/public/cs87/lab2. The starting point contains
a sample makefile for building a web_client and web_server executables,
and the very beginings of both (mostly just #includes).
I strongly encourage you to implement and test incrementally.
Also, it is very important to check return values from all functions
and to handle error return values correctly. For example, if
a call to read on a socket returns before the requested number of bytes
have been read, this could mean that the other end of the socket was
closed. When this is the case, you want to stop continuing to
try to read from this socket (an infinite loop).
Here is one suggestion for proceeding:
- Starting with the starting point code, finish a simple client and server
program where the client connects to the server and sends it a simple message
and waits for a response. The server should receive the message, print
it out, and close the socket. The client should exit when it detects the
server has closed its end of the socket.
- See if you can connect to your webserver from wget, firefox and telnet
and send it an http request (in the correct format). Your server could
just print out the message and close the socket.
- Next, modify your server to send a fake response to a client GET request
(don't really parse the requested page and fetch the corresponding file, but
send a 200 response with a very short web page message body. If all goes well,
firefox should display your bogus web page after receiving your response.
If things don't go well, connect to your server using telnet as you
can more easily see what the client is receiving from your server.
- Next, add support for finding the correct web page to return for a
GET response. Add support for handling different errors (file not found,
etc.).
- Make sure your program is free of valgrind errors (it would not
hurt to run in it on valgrind as you develop different parts too).
- Remove (or comment out) any debug output before submitting
your solution.
Your program should use good modular design, be well-commented, robust,
and correct. See my C Style Guide off my C resource page.
Useful Functions and Resources
- HTTP made real easy. by Jim Marshall
- HTTP 1.0 Specification
- HTTP 1.1 Specification
- Socket Programming Links.
- C and C++ programming and debugging. Includes some documentation on C string
library functions and file I/O.
- A few string functions that may be particularly useful for
this assignment:
strtok: string tokenizer, multiple calls to it on same string return a pointer
to the next token in the string:
#include <string.h>
char *next =0;
char *s = "hello there how are you?"
char *delim = " \t\r\n"; // delimiters are space, tab, cr, eoln
next = strtok(s, delim); // first call pass string to tokenize
while (next != 0) {
printf("%s\n", next);
next = strtok(0, delim); // subsequent calls pass 0 to get the
// next token in the string s
}
sprintf: like printf, but instead of writing out the resulting string, it
is copied to the dest string: sprintf(dest, format_string, args ...)
ex:
#include <stdio.h>
char result[1024];
sprintf(result, "%s%d: %4.2f\n", "hello there", 34, 6.55);
// result string will have value: "hellothere34: 6.55\0"
printf("%s", result);
- ifconfig, dig, nslookup: to get a machine's IP address:
$ ifconfig # (/sbin/ifconfig) on machine on which you want the IP
eth0 Link encap:Ethernet HWaddr
inet addr:130.58.68.62 Bcast:130.58.68.255 Mask:255.255.255.0
... ^^^^^^^^^^^^
IP address
$ dig tomato.cs.swarthmore.edu
$ nslookup tomato.cs.swarthmore.edu
- setsockopt: I'd recommend setting the server's socket options
SO_LINGER to off (0) and SO_REUSEADDR to true (1). There is an example
call to setsockopt in the web_client.c starting point code.
- access: check to see if a file is accessible in some way:
access(path_name_of_file, X_OK | F_OK);
access(path_name_of_file, R_OK | F_OK);
- stat: get statistics about a file, including its size in bytes,
modification time, ...
struct stat stat_info;
ret = stat("/home/newhall/foo.txt", &stat_info);
- time, gmtime_r, ctime_r functions to get current time (in GMT
time zone) and convert it to a string representation. time returns the Unix
time, which is the number of seconds since Jan 1, 1970. It takes a
time_t argument that it sets to this value. gmtime_r converts it to GMT.
mktime converts the time rep returned by gmtime_r to a time_t value.
ctime takes a time_t value and creates a string representation
of the time. You should use the reentrant versions of functions.
Here is an example (note: there is missing error detection
and handling in this example):
char buff[64];
time_t mytime, mytime2;
struct tm my_time_struct;
time(&mytime);
gmtime_r(&mytime, &my_time_struct);
mytime2 = mktime(&my_time_struct);
ctime_r(&mytime2, buff);
printf("Date: %s\n", buff);
In the initial stages of implementing your solution, you can just return
a bogus time string for the Date header.
- man and apropos: documentation for system calls (in section 2) and C library calls
(in section 3).
$ man 2 read
$ man 2 send
$ man 3 strcpy
$ man 2 stat
You can use either read and write or send and recv to send and receive
messages on sockets.
Submission and Demo
Create a
tar file containing:
- All your webserver source files, and makefile to build server
(and client if applicable).
- A README file with: (1) you and your partner's names; (2) an example
of how to run your webserver (a command line); and (3) a description
of any features you have not fully supported and/or any errors you
were unable to fix.
I'd suggest creating a handin directory and copying all these things into
it. It is good to check that you have your full solution in the
handin directory (type 'make' to check that everything builds, try running
it, then type 'make clean' to remove executables and .o's from what you
submit). Then tar up your handin directory.
One of you or your partner should submit your tar file by running cs87handin.
Demo
You and your partner will sign up for a 15 minute demo slot to demo your
web server. Think about, and practice, different scenarios to demonstrate
both correctness and good error handling.