CS87: Lab 2

For this assignment you and your partner will implement a webserver. This lab is designed to give you some practice writing client-server socket programs. For part 1 you will write a single-threaded webserver that will handle one client request at a time. In the next part you will add features so that your webserver can simultaneously handle multiple client requests.

Contents:
Project Requirements
Project Details
Getting Started
Useful Functions and Links to more Resources
Submission and Demo

Project Requirements

Your webserver should be written in C or C++.
Use port 8888 instead of port 80.
Your webserver should implement the HTTP/1.0 protocol which closes connections to clients after the response has been sent. You do not need to handle HTTP/1.1, which keeps connections open for some time. If you receive an HTTP/1.1 request, just respond with an HTTP/1.0 response (1.1 clients can handle 1.0 responses.)
It should handle GET and HEAD client requests. It does not need to handle POST nor any other requests.
It should return appropriate status codes. You should support 200, 400, 403, and 404. If you return an error code you should also return headers and a message body with a simple error page. For example:
"<html><body>Not Found</body></html>"
It should support the headers Content-Length, Content-Type, and Date.
It does not need to handle any php or javascript parsing. If the client requests a .php file, just send the file contents back just like an .html or .jpg file.
It should handle urls that start with / and that start with /~username. The web server's starting pages are in /scratch/cs87webpages/. url requests for /~username/, come from /home/username/public_html/.
Your webserver should be free of memory access errors (i.e have no valgrind errors.)

Note that if you start your webserver on one of our lab machines, you can only connect to it with clients that are also running on our lab machines.

Project Details

Web server

The basic design of your webserver is the following:

create a listen socket on port 8888
enter an infinite loop:
1. accept the next connection
2. call a handler function, passing it the socket returned by accept, that will handle the client's request.
3. after the client's request is handled, close the socket and return to the main loop.
it should exit its infinite loop and quit when it gets appropriate error return values from accept, send, recv, read, write, ...

Web clients

You can use multiple programs to connect to your webserver and send it commands:

telnet server_IP port_num, then type in a GET command (make sure to enter a blank line after the GET command). For example:
```
$ telnet 130.58.68.62 8888

  GET /index.html HTTP/1.0
```
telnet will exit when it detects that your webserver has closed its end of the socket
firefox: Enter the url of the desired page specifying your web server using its IP:port_num (e.g. http://130.58.68.62:8888/index.html)
wget: wget -v 130.58.68.62:8888/index.html
modify the example client program to send http requests to your server. I don't think this is necessary (since the other three clients are already written for you), but you could modify the web_client program given with the starting point code, to send GET requests to your webserver and receive the responses.

HTTP

Start by reading HTTP Made Really Easy by Jim Marshall.

It is very important that you can interpret the format of a client request correctly, and that you send correctly formated responses to clients. Many parts of a correctly formatted message involve sequences of carriage return and newline characters ("\r\n"). These are used to signify the end of all or part of a "message". Here is the general format of a server request:

   initial line
   Header1: value1
   Header2: value2
   Header3: value3

   (optional message body goes here)

For example, a GET response for a very simple page may look like:

   HTTP/1.1 200 OK
   Date: Sun, 10 Jan 2010 18:17:43 GMT
   Content-Type: text/html
   Content-Length: 53

   <html>
   <body>
   <h1>CS 87 Test Page</h1>
   </body></html>

It is very important that each header line ends with a "\r\n" and that there is a blank line (another "\r\n") between the headers and the message body. The message body, however is sent without a trailing "\r\n". Instead the header Content-Length is used to tell the client the size of the message body.

GET requests and mapping urls to files

There is one format of url that you do not need to handle for this assignment. These are ones where the server would respond with a "301 Moved Permanently" response vs. responding with OK and the file contents. This case is described below in more detail.

Here are some example GET requests that you need to handle, and their corresponding file name(s) ("directory names" correspond to files named either index.html or index.php in the named directory):

GET  /   HTTP/1.0                           /scratch/cs87webpages/index.html 
                                            /scratch/cs87webpages/index.php 

GET /index.html  HTTP/1.0                   /scratch/cs87webpages/index.html 

GET /index.php   HTTP/1.0                   /scratch/cs87webpages/index.php 

GET /search.html HTTP/1.0                   /scratch/cs87webpages/search.html

GET /courses/ HTTP/1.0                      /scratch/cs87webpages/courses/index.html 
                                            /scratch/cs87webpages/courses/index.php 

GET /~newhall/  HTTP/1.0                    /home/newhall/public_html/index.html
                                            /home/newhall/public_html/index.php

GET /~newhall/newcluster.jpg  HTTP/1.0      /home/newhall/public_html/newcluster.jpg

You do not need to correctly handle GET requests of the following format (i.e. GET requests with no trailing '/' when the last name corresponds to a directory):

GET /~newhall  HTTP/1.1
GET /courses  HTTP/1.1

The way a web server would handle requests like this is to send a "301 Moved Permanently" response to the client with the real url of the page ("Location: http://IP:portnum/~newhall/"). The client would resend the GET request using the url returned by the server:

GET /~newhall/  HTTP/1.1

When your webserver receives a request of this form, you can choose to either have it respond with an error response or with OK. If your webserver sends an OK response, then the client may make subsequent GET requests for any files included in the page, and these GET requests will not have the correct url (the client doesn't know that newhall is a directory and instead of requesting /~newhall/foo.jpg will request /foo.jpg, if my homepage includes the foo.jp file). Just handle these as you would any bad url (there is no file associated with /foo.jpg).

You do, however, need to correctly handle GET requests with the trailing '/' (e.g. /~newhall/).

You are welcome to add support for 301 responses if you'd like, but you are not required to do so for this assignment, so I'd suggest only adding this after the rest of your webserver works.

Getting Started

You can grab a copy of my starting point files for this lab. They are in ~newhall/public/cs87/lab2. The starting point contains a sample makefile for building a web_client and web_server executables, and the very beginings of both (mostly just #includes).

I strongly encourage you to implement and test incrementally. Also, it is very important to check return values from all functions and to handle error return values correctly. For example, if a call to read on a socket returns before the requested number of bytes have been read, this could mean that the other end of the socket was closed. When this is the case, you want to stop continuing to try to read from this socket (an infinite loop).

Here is one suggestion for proceeding:

Starting with the starting point code, finish a simple client and server program where the client connects to the server and sends it a simple message and waits for a response. The server should receive the message, print it out, and close the socket. The client should exit when it detects the server has closed its end of the socket.
See if you can connect to your webserver from wget, firefox and telnet and send it an http request (in the correct format). Your server could just print out the message and close the socket.
Next, modify your server to send a fake response to a client GET request (don't really parse the requested page and fetch the corresponding file, but send a 200 response with a very short web page message body. If all goes well, firefox should display your bogus web page after receiving your response. If things don't go well, connect to your server using telnet as you can more easily see what the client is receiving from your server.
Next, add support for finding the correct web page to return for a GET response. Add support for handling different errors (file not found, etc.).
Make sure your program is free of valgrind errors (it would not hurt to run in it on valgrind as you develop different parts too).
Remove (or comment out) any debug output before submitting your solution.

Your program should use good modular design, be well-commented, robust, and correct. See my C Style Guide off my C resource page.

Useful Functions and Resources

HTTP made real easy. by Jim Marshall
HTTP 1.0 Specification
HTTP 1.1 Specification
Socket Programming Links.
C and C++ programming and debugging. Includes some documentation on C string library functions and file I/O.

A few string functions that may be particularly useful for this assignment:

strtok: string tokenizer, multiple calls to it on same string return a pointer
        to the next token in the string:

        #include <string.h>
        
        char *next =0;
        char *s = "hello   there    how    are   you?"          
        char *delim = " \t\r\n";   // delimiters are space, tab, cr, eoln

        next = strtok(s, delim);     // first call pass string to tokenize
        while (next != 0) {
            printf("%s\n", next);
            next = strtok(0, delim);   // subsequent calls pass 0 to get the
                                       // next token in the string s
        }

sprintf:  like printf, but instead of writing out the resulting string, it
          is copied to the dest string: sprintf(dest, format_string, args ...) 

     ex:
           #include <stdio.h>
           
           char result[1024]; 
           sprintf(result, "%s%d: %4.2f\n", "hello there", 34, 6.55);
           // result string will have value:  "hellothere34: 6.55\0"
           printf("%s", result);

ifconfig, dig, nslookup: to get a machine's IP address:

$ ifconfig        #   (/sbin/ifconfig) on machine on which you want the IP

eth0      Link encap:Ethernet  HWaddr 
          inet addr:130.58.68.62  Bcast:130.58.68.255  Mask:255.255.255.0
...                 ^^^^^^^^^^^^
                    IP address

$ dig tomato.cs.swarthmore.edu

$ nslookup tomato.cs.swarthmore.edu

setsockopt: I'd recommend setting the server's socket options SO_LINGER to off (0) and SO_REUSEADDR to true (1). There is an example call to setsockopt in the web_client.c starting point code.

access: check to see if a file is accessible in some way:

access(path_name_of_file, X_OK | F_OK);
access(path_name_of_file, R_OK | F_OK);

stat: get statistics about a file, including its size in bytes, modification time, ...
```
struct stat stat_info;

ret = stat("/home/newhall/foo.txt", &stat_info);
```
time, gmtime_r, ctime_r functions to get current time (in GMT time zone) and convert it to a string representation. time returns the Unix time, which is the number of seconds since Jan 1, 1970. It takes a time_t argument that it sets to this value. gmtime_r converts it to GMT. mktime converts the time rep returned by gmtime_r to a time_t value. ctime takes a time_t value and creates a string representation of the time. You should use the reentrant versions of functions.
Here is an example (note: there is missing error detection and handling in this example):
```
  char buff[64];
  time_t mytime, mytime2;
  struct tm my_time_struct;

  time(&mytime);
  gmtime_r(&mytime, &my_time_struct);
  mytime2 = mktime(&my_time_struct);
  ctime_r(&mytime2, buff);
  printf("Date: %s\n", buff);
```
In the initial stages of implementing your solution, you can just return a bogus time string for the Date header.
man and apropos: documentation for system calls (in section 2) and C library calls (in section 3).
```
$ man 2 read
$ man 2 send
$ man 3 strcpy
$ man 2 stat
```
You can use either read and write or send and recv to send and receive messages on sockets.

Submission and Demo

Create a tar file containing:

All your webserver source files, and makefile to build server (and client if applicable).
A README file with: (1) you and your partner's names; (2) an example of how to run your webserver (a command line); and (3) a description of any features you have not fully supported and/or any errors you were unable to fix.

I'd suggest creating a handin directory and copying all these things into it. It is good to check that you have your full solution in the handin directory (type 'make' to check that everything builds, try running it, then type 'make clean' to remove executables and .o's from what you submit). Then tar up your handin directory.

One of you or your partner should submit your tar file by running cs87handin.

Demo

You and your partner will sign up for a 15 minute demo slot to demo your web server. Think about, and practice, different scenarios to demonstrate both correctness and good error handling.

CS21 Lab2: Web Server Part 1

Web server

Web clients

HTTP

GET requests and mapping urls to files

Demo