CS43 Lab 4: Reliable Transport

Due: Thursday, November 5, 11:59 PM ET

Handy references:

mininet documentation
select() manual. (You will need to set timers for your timeout, select makes this easy.)
TCP Connection Teardown in class slides.

Lab 4 Goals

Understand principles of reliability at the transport layer
Design and implement a stop-and-wait (SW) transport layer protocol.
Develop C library code for your SW protocol over a UDP socket.
Learn to use mininet, a network emulator to design and test your protocol.

Overview

For this lab, you will be designing and implementing reliable data transfer over an unreliable (simulated) link. Your submission will be in the form of a library that mimics the type of functionality that you would expect to get from an OS’s transport-layer implementation.

Your transport protocol should be a stop-and-wait protocol for now. We will introduce high-performance pipelining features in lab 5. To implement stop-and-wait you should use ACKs and timeouts to reliably (without packet loss), transfer all the packets from source to destination over a simple dumbbell network.

Figure 1. The figure shows a dumbbell network with two end-hosts, Host 1 and Host 2 connected by a switch. The switch has a buffer with buffer size 2, to hold packets in transit. Host 1 and Host 2 each have two links that transfer data from and to the end hosts respectively. We start the mininet simulation with a delay of 10ms and 5% loss on each link.

Lab Requirements

Stop-and-wait protocol specifications:

Your protocol should reliably transfer data in the presence of lossy links.
Link packet buffers will be very small (e.g., size = 2 packets)
Perform RTT estimation to determine setting timeout values.
Cleanly shut down connections on both ends, even if packets get lost.
You do not have to deal with checksums or error detection and corruption.
Your library (lab4.h and lab4.c) should allow any application that’s built on top of it to achieve reliable communication. When transferring files with your library, you should get byte-for-byte identical copies using diff, md5sum.

Working example

We will run this lab inside a Virtual Machine (VM) that runs mininet - a virtual network emulator.
The following commands show how you can test your application-layer code with a completely implemented version of your lab.

# SSH into the virtual VM and cd into Lab4
$ ssh localhost -p 22222 -l mininet
$ cd lab4-username/

# Setup mininet with appropriate delay and loss parameters
$ sudo mn --link tc,delay='10ms',loss=5,max_queue_size=2

# run lab4 receiver first in the mininet terminal
$ mininet> h2 ./lab4_receiver 9000 > output.txt 2> output.err & echo $!

# SSH into the virtual VM in a different terminal
$ tail -f output.txt output.err

# run lab4 sender in mininet (mininet terminal)
$ mininet> h1 ./lab4_sender 10.0.0.2 9000 < 1000lines

# view differences in output and input (second terminal)
$ md5sum 1000lines output.txt
187323fe69aa075411d75dd0849f8263  1000lines
187323fe69aa075411d75dd0849f8263  output.txt

The example code includes test application layer code that sits above your transport layer library and uses it to transfer a file. You may edit the test application for your own testing, but when grading, I will use the default version, so you should not rely on changes to these files for correct operation.

Getting your Lab 4 Starting Point Code

Log into CS43 Github for our class and get the ssh-URL to your lab git repository. Follow along with the prompts below to SSH, create a lab directory and clone your lab repos.

# ssh into our lab machines
ssh yourusername@lab.cs.swarthmore.edu

# cd into your cs43/labs sub-directory and clone your lab2 repo
cd ~/cs43/labs
git clone [your-ssh-URL]

# change directory to list its contents
cd lab4-user1-user2

# ls should list the following contents
ls
 Makefile README.md week1-worksheet.md week2-worksheet.md 1000lines generate_test_files.py lab4.h lab4.c lab4_sender.c lab4_receiver.c CS_machines

High-level overview of your program

Figure 2. The figure shows a stop-and-wait protocol with two end-hosts; one functioning as the sender and other, the receiver. The sender sends a data packet and waits till they get an ACK from the receiver. The sender only sends the next data packet after the ACK is successfully received. The stop-and-wait protocol uses timeouts to handle message losses and corruption. At any point in time, there is only one data packet in flight.

We will be building a stop-and-wait protocol with ACKs, timeouts, retransmissions and RTT estimation on top of an unreliable channel. We will use UDP socket calls to get an unreliable channel, and use mininet, our network emulator to force losses to occur in a controlled manner.
Protocol API: You are responsible for writing/editing only the following functions:
- in lab4.c: my_socket, my_send, my_rtt, my_recv and my_close.
- in lab4.h: add state to struct lab4_hdr
- to send and receive data across the wire, we will be using UDP’s unreliable send and recvfrom in your implementation of a Stop-and-Wait version of my_send and my_recv. Your code in these functions are responsible for implementing reliability on top of UDP.
Application layer: lab4_sender.c and lab4_receiver.c implement the application layer that calls your library functions. lab4_sender.c and lab4_receiver.c each get their own copy of your library functions.
- You may only edit lab4_sender.c and lab4_receiver.c for your own testings purposes. When grading, I will use the default version, so do not rely on changes to the application layer for correct operation.

Workflow

Your code submission should only modify lab4.c and lab4.h. These two files are transport layer functions and you can think of them as a reliable transport library that is independently being imported by lab4_sender.c and lab4_receiver.c. This means that lab4_sender.c will get its own instance (independent state) of lab4.c, lab4.h and so will lab4_receiver.c.

Setup initial state in my_socket call, that any reliable transport protocol would need to get started.
Setup metadata that the transport layer packet header will need to carry in lab4.h. You can keep this header value minimal to the state you need for proper functioning.
Write a helper function that helps you estimate RTT based on the sample RTT measurement, and the RTT estimate calculator.
Implement reliable sending in my_send. To reliably send data, you will need to implement the following:
1. Copy and send the application layer payload + transport header into your send buffer.
2. Maintain data structures to estimate RTT.
3. Keep track of packets sent and acknowledgements received.
4. Resend data packets if you timeout on a packet.
Implement reliable receive in my_recv. To reliably receive data, you will need to implement the following:
1. Receive a packet over the network, you do not need to check for errors in the checksum.
2. Make sure packets are received in order, and send appropriate acknowledgements to the sender.
3. If packets are received in order, remove the transport layer header and copy the application layer payload to the application buffer
4. Return the number of bytes received to the application.
Implement reliable closing in my_close. Your implementation should cleanly shut down connections on both ends, even if packets get lost.
1. You do NOT need to implement the TCP behavior that allows each side to shutdown independently. Like TCP though, in my_close(), you may want to wait for some time, to make sure the last ACK didn’t get lost (leaving one end hanging).
2. TIP: since you are closing reliably, you can call my_send in my_close.

Weekly Deliverables

Week 1

Setup a virtual machine according to the instructions below.
Setup and get familiar with mininet
Complete week1-worksheet.md
Implement 1-4 of the workflow.

Week 2

Implement 5 and 6. Reliably closing takes some time to think about - don’t leave this as an afterthought!
Complete week2-worksheet.md
Test your code to make sure you can reliably send and receive.

Virtual Machine (VM) Setup

For the next two labs, we will be using a network emulation package called mininet to create virtual links that allow us to vary their performance characteristics (latency, loss rate, etc.). Because it requires special permissions to execute, we’ll be running it inside of a virtual machine. Both lab partners should follow along with either local or remote setup.

There are two paths that you can use to get setup with your VM.

Download and setup locally on your machine
Run it on the CS machines.

Whichever path you chose, your primary development environment will continue to be your chosen editor locally and not in the VM. We will setup Github on your VM where you can SSH, pull your code changes, and test your code.

VM Setup on your local machine

A virtual machine allows you to emulate a specific Operating System with simulated hardware characteristics.

To setup a virtual machine you will have to first download and install VirtualBox. VirtualBox is handy for setting up VMs for numerous instances, outside of this course and lab.

Next, open up your terminal and cd into the folder where you would like to keep your VM. And use scp to copy the following image file:

$ cd <your_chosen_directory>
# don't forget the trailing period in the line below - we want to copy the file to the current directory.
$ scp username@lab.cs.swarthmore.edu:/home/chaganti/public/cs43/f20/CS43-Mininet-VM.ova .

The file is about 1.2G in size, and will take a couple of minutes to download. Once you download the file, you can follow these steps to setup your VM.
1. Open virtualbox from your applications.
2. Go to File→Preferences, set your Default Machine Folder to where you want to store your VM locally, and then close the preferences window.
3. Go to File→Import Applicance. Go to the folder where you downloaded the .ova file and select CS43-Mininet-VM.ova and push next once.
4. You should now be seeing Appliance settings. Edit the name to include your username. For example, CS43-Mininet-VM-<your_username>.
5. Click import and wait a minute for it to complete.
After you have completed these steps, you should see your VM in the list of VMs available to start. Go ahead and turn it on.

VM Setup on the CS machines

Your lab folder has two specific CS machines that are dedicated for your team’s use with this lab. You should use these machines if you would like to use a VM via SSH and not other machines. You should both decide which of the two machines you would like to use.

To get started, first ssh into your machine and cd into /local. You should see a file in /local called CS43-Mininet-VM.ova

# ssh into your chosen CS_machine
$ ssh username@<CS_machine>.cs.swarthmore.edu
$ cd /local
$ ls
 CS43-Mininet-VM.ova

We will now import a copy of the starter VM image. Run the following command with the --dry-run option. This will allow you to simulate an actual run and make sure you have the right settings before actually running the command. .Dry-run option with VBoxManage image::Stop-and-wait.png[align="center",300,300]]
- Make sure to change the name of your VM instance to CS43-Mininet-VM-your_username.ova as shown with the vmname command.
  $ VBoxManage import CS43-Mininet-VM.ova --vsys 0 -vmname CS43-Mininet-VM-your_username --vsys 0 --basefolder /local/ --dry-run
You should see an output similar to that shown below. Verify the the file location and name settings have the path pointing to /local and have your username in vmname.

If everything looks good, we are ready to create our vm. Run the following command

$ VBoxManage import CS43-Mininet-VM.ova --vsys 0 -vmname CS43-Mininet-VM-your_username --vsys 0 --basefolder /local/

Now run vboxmanage list vms at the terminal, to see the new vm that we created. You should see something like the following:
```
$ vboxmanage list vms
"CS43-Mininet-VM-vchagan1" {c0546631-0846-4747-864c-df06851056c5}
```

If at this point you don’t see your VM, stop here and ask for help. If you have succeeded congratulations! You can proceed to the next step. If you have created a VM that is not to the above specifications that’s okay, we can delete it and try the steps again. If you see "inaccessible" VMs or VMs that you did not plan to create, you can use the following commands:

$ vboxmanage list vms
"<inaccessible>" {c9d3f17a-8f97-11e6-ae22-56b6b6499611}
"CS43-Mininet-VM-username" {c9dda107a-8d97-12f6-a212-56d5hc9613}

# run the unregistervm command:
$ vboxmanage unregistervm c9d3f17a-8f97-11e6-ae22-56b6b6499611

Storing things on /local has two major implications that you need to account for:
1. The data on /local is stored on a single disk (unlike your home directory, which is split across multiple disks for redundancy). When you’re done working, you should save your important lab files elsewhere (e.g., push them to github) to avoid data loss in the event of a disk failure.
2. The /local partition on each machine is only available on that machine. This means that if you SSH into a different machine, you will not have access to your VM.
After you have completed these steps, you should see the your VM in the list of VMs available to start. Let’s go ahead and turn it on.

Turning on your VM from the terminal

To turn on your VM from terminal, first ssh into your machine, and cd into /local and list your vm
```
$ ssh username@<CS_machine>.cs.swarthmore.edu
$ cd /local
$ vboxmanage list vms
```
Locate your VM name in the vm-list and run the following command to start your VM in headless mode.
```
$ vboxheadless -startvm CS43-Mininet-VM-your_username
```
Once you run this command, the terminal will no longer be available to type in. You should open a new terminal and ssh into your CS_machine to interact with the VM.

Starting your new Virtual Machine via SSH.

We will primarily use SSH to work with our new VM. If you are using a GUI you can just minimize the window that pops up when you first start-up the VM. The VM is already configured such that you can connect to it by ssh ing to your local machine on port 22222. (Port 22222 on your machine gets forwarded to port 22 - for SSH on the VM):

Running your VM:
- Local Install: From the VirtualBox GUI, just click on your VM name and hit Start. Once your VM has started, pull up a terminal and SSH into the VM. We can SSH into the VM with port forwarding turned on.
  ssh -Y localhost -p 22222 -l mininet (The password is: mininet)
- CS Machines: Once you are SSH-ed into your CS machine at the terminal, execute the vboxheadless -startvm command as shown above. Once you have executed this command, the terminal will no longer be available to work on. You should open another terminal ssh-ed to your CS machine to execute the following commands. Once your VM has started, we can SSH into the VM.
  ssh localhost -p 22222 -l mininet (The password is: mininet)
Before going any further, you should use the passwd command to set a new password to protect your VM. Keep this simple so you remember it.
After that, you’ll need to configure an SSH key so that you can access GitHub. Create a new key for the VM and add it to your GitHub account. (Instructions)
You can now git clone your lab repo into your VM.
When you’re done working on your VM, you should shut it down nicely: sudo shutdown -h now. We still have work to do in our VM so if you shut it down at this stage, you’ll have to restart it with either the vboxheadless -startvm command or from the GUI.

Mininet Environment

Once you’ve connected to the VM, you can run mininet:

Mininet Topology

We will run a minimal network topology with two hosts connected to a switch as shown in Figure 1. We will use the following sudo mn command, to set a latency of 10ms, and 5% packet loss on each link. Since this is a Stop-and-wait protocol we will set a buffer size of 2 at the switch. For this lab, each time you want to run your lab4 code you need to execute the following commands:

From your terminal that’s ssh ed to the VM, run mininet.

Run the following command, note that there is no space between parameters:

sudo mn --link tc,delay='10ms',loss=5,max_queue_size=2

command options explanation:
--link tc: traffic control for the links

If you are running VirtualBox locally, you can run your VM with port forwarding turned on
sudo mn --link tc,delay='10ms',loss=5,max_queue_size=2 -x -x: setup X11 forwarding for the four terminal windows to pop up. Feel free to disable this if you like.
This will pop up four terminal windows: one for each of the two virtual hosts (h1 and h2), one for the switch (s1), and one for a controller (c0). Close the controller and switch terminals, you won’t be using those for this lab. All the operations you perform on the gray boxes for the two virtual hosts h1 and h2 in our network. These hosts function as though you were ssh 'ed to two lab machines! They work independently like different machines but share the same files.

You can choose to either run commands in the little gray boxes (virtual hosts), or run commands on your mininet terminal (much easier to view generally!).

All the commands listed below are as though you are running on the mininet terminal

Running commands on mininet hosts

Run ifconfig on each of the two end hosts. You will see that h1 has an IP address of 10.0.0.1 and h2 has 10.0.0.2.
```
mininet> h1 ifconfig
mininet> h2 ifconfig
```
You can also transmit data between the two end hosts. Test this by running the ping command at h1:
```
mininet> h1 ping h2
```
You should see a (round trip time) delay of ~ 40 ms with approximately a 19% loss rate (each link has an independent 5% chance of dropping a packet).
You will find the example code in the lab4 directory, which is shared between the VM and each of the virtual hosts. (Try ls on the two hosts to see the shared folder).
```
mininet> h1 ls
mininet> h2 ls
```

Saving and Shutting down the VM

Edit/execute code and work on the lab. You’ll want to periodically make a backup of your changes (push to GitHub), since the VM image is stored on the machine’s local disk. If the disk fails, you don’t want to lose your changes…
When you’re done, you can tell the VM to shutdown by executing:
```
mininet> exit
sudo shutdown -h now
```

Stop-And-Wait Protocol

This section adds more detail on setting up your Stop-And-Wait Protocol.

The `select()` system call

The select system call allows us to execute event-driven concurrency. Recall when we discussed thread vs. event-driven concurrency we said that we can use select to monitor a bunch of sockets and let the OS tell us when a socket is available to send or receive data and not blocked.

In this lab, we are going to use the timing functionality of select, to keep track of RTT estimates. Type in man select at the terminal to see how select works.

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

nfds: The first argument to the select function is (the largest numerical socket descriptor value in any of the subsequent FD_SET parameters) plus one. We only have one socket in use in this lab, so if you’re populating your FD_SET with socket descriptor int sock = 4, your first argument to select should be 5 (4 + 1).
fd_set: select takes as input variables that by convention are named, rfds and wfds of type fd_set. These are read/write file or socket descriptors (fd stands for file descriptor). We will ignore the exceptfds. As far as the OS is concerned file and socket descriptors are equivalent.
struct timeval * timeout: Scroll down in the man page to see the fields of struct timeval. When we call select on a socket, it will modify struct timeval and decrement it by how much time has passed.
```
struct timeval{
  long tv_sec;  /* seconds */
  long tv_usec; /* microseconds */
}
```
There is starter code given to you to help convert seconds to the datatype that struct timeval expects.
1. If you set your struct timeval to 100ms and call select and have to wait 100ms to receive an ACK, then select will return 0 indicating that your socket timed out waiting for a response.
2. At this point you need to reset your timer - struct timeval before sending your data again.
If the return value of select is non-zero you know that your socket recvfrom() call has not timed out and you can call recvfrom safely (without blocking) once, but not more than once.
1. You can also extract the amount of time the ACK took to get back from your struct timeval.

General workflow with `select`

In this lab, we only estimate RTT in my_send, when we are waiting to receive an ACK we only need to set the &rfds in select.

Declare fd_set rfds;
FD_ZERO(&rfds): clears out the set.
FD_SET(sock, &rfds): allows you to put a socket in the read file descriptor set. We will have to redo this and the following steps, everytime we call select.
Send data, set the struct timeval variable.
Once we are ready to receive, call select with inputs: select(sock + 1, &rfds, NULL, NULL, &timeout_struct)

Setting up the packet headers:

Your packet header in lab4.h should look like the following:

struct lab4_hdr {
    uint32_t sequence_number;
    uint32_t ack_number;
};

You can add maybe one more variable to keep track of a close packet.
Your code does not need to implement the TCP header! The state we have (seg #, ack #, <blank>) should be all we need.

Setting up state in `lab4.c`:

Currently, the only state in lab4.c is int sequence_number.
Your code should probably have a few more global state variables to maintain RTT values, and a state variable for closing.

`my_send` function:

This function is responsible for reliable delivery of data packets from the sender to the receiver. Only lab4_sender.c will call my_send.
To implement reliable delivery, you want to keep sending your data in a loop until, you have received an ACK from the receiver.
To figure out if a packet has been successfully sent we will use select to implement our timeout functionality. Type man select at the terminal and take a look at the last parameter to select.

Using select to implement sending: To implement sending a packet and setting a timeout value with select, your code should implement the following functionality:

1: setup your packet and initial state, timeout, etc.
2: if not received an ack:
3: FD_SET your socket for reading
4: send packet
5: set value of timeout struct using: msec_to_timeval(timeout_val, &timeout_struct)
6: call select(sock +1, params, &timeout_struct)
7: check if select has timed out, and double the timeout value if so.
8: if not, receive an ACK using recvfrom()
9: only return from the function *after* receiving ACK from receiver.

Note that the timeout_struct has to be reinitialized every time you call select! select internally decrements the value of the timeout_struct until you receive an ACK.

RTT estimation in `my_rtt`

Your implementation will need to perform RTT estimation to determine how you should set timeout values.
Use current_msec() to get an estimate of the current time stamp and use msec_to_timeval() to fill out struct timeval.
To estimate RTT, we will use an exponentially weighted moving average (EWMA) estimate that we discussed in class.
```
EstimatedRTT  = (1 – a) * EstimatedRTT + a * SampleRTT, with a = 1/8
DevRTT = (1 – B) * DevRTT + B * | SampleRTT – EstimatedRTT |, with b = 1/4
TimeoutInterval = EstimatedRTT + 4*DevRTT + 15
```
Normally you wouldn’t export such information up to the application/user, but I’ll use this to check your RTT calculation.
Your starter code has an initial RTT of 1000. You can change that to around 100 and let your code converge to the network RTT.
Like TCP, you should compute your timeout as a function of the current RTT estimate, sampling only those segments for which no retransmission is necessary. If a timeout occurs, you should double the timeout for each subsequent retransmission of the same segment.

Adding 15 to the TimeoutInterval

The EstimatedRTT and DevRTT estimation assumes that there is some noise in your estimation (DevRTT > 0), as it tries to converge to the network RTT. In practice, network RTT also has some variance, but since this is mininet, we have artificially set a fixed value for the network RTT for every packet.

This means, as we increase the file sizes that we send, the estimate of DevRTT will start to converge to zero! With DevRTT zero, the TimeoutInterval will be exactly equal to the RTT and most of your calls to send will timeout! If we add some constant parameter to your timeout formula we can ensure that you are not going to timeout exactly at EstimatedRTT.

`my_recv` function:

The receiver is responsible for sending ACKs to the sender. Since we are only sending one packet at a time, it’s worth thinking about what you want to set as your ACK value.
Only lab4_receiver.c will call my_recv.
The receiver should use UDP recvfrom to receive data, and send an ACK for every packet received.
my_recv should only return to the application if a new packet has been received. Your might want some sort of a loop to implement this functionality.
Since the receiver is only responsible for sending ACKs, we do not need to keep timeouts, or use select on the receiver side.

For both send or receive we are not going to be using a loop. This time, the underlying socket is using UDP rather than TCP, so there is no byte stream abstraction. When you call receive, you will get one UDP datagram whose size is exactly the size of of MAX_PACKET. It is our responsibility to build reliability over the underlying UDP send and receive.

Implementing `my_close`

Your implementation should cleanly shut down connections on both ends, even if packets get lost. You do NOT need to implement the TCP behavior that allows each side to shutdown independently. TCP is substantially more complex than the protocol you’re building because your protocol is unidirectional.

Like TCP though, in my_close(), you may want to wait for some time, to make sure the last ACK didn’t get lost (leaving one end hanging).
HINT: since you are closing reliably, you can call my_send in my_close.
You should NOT rely on ICMP error messages to help your closing procedure. To be safe, it’s best to disable ICMP from both hosts before starting the sender and receiver with:
```
mininet> h1 iptables -I OUTPUT -p icmp -j DROP
mininet> h2 iptables -I OUTPUT -p icmp -j DROP
```

`struct` casting tricks

Network folks often write code with the following weird syntax:

char packet[MAX_PACKET];
memset(packet, 0, sizeof(packet));
struct lab4_hdr *hdr = (struct lab4_hdr *) packet;

The syntax above lets us create a pointer to the packet of type struct lab4_hdr. This means, that anytime we want to pack values into the header we can use a more intuitive representation of the data we are packing.

uint32_t ack_number = 1;

/* without struct casting */
memcpy(packet+4, htonl(ack_number))            //packing
ack_number = ntohl(*((uint32_t *) &packet[4])) //unpacking

/* with struct casting */
hdr->ack_number = htonl(ack_number);          //packing
ack_number = ntohl(hdr->ack_number);          //unpacking

Grading Rubric

This assignment is worth 8 points.

2 - You fill in the weekly-lab sheet and provide a description of how you tested and debugged your protocol.
2 - Your protocol is a stop and wait protocol that delivers data reliably, even when link buffers are small (i.e., two packets) when no packets are lost.
2 - Your protocol reliably delivers data despite packet losses.
1 - Your protocol correctly estimates the round trip time of the path.
1 - Your protocol cleanly terminates connections such that both ends agree that the connection is closed.

Testing

If you want to add a print statement to your library, use fprintf to print to the stderr stream rather than the usual stdout. There are examples of this in the library already. The benefit of using this method is that it won’t interfere with the results that are outputted by the receiver, since that captures only stdout.
You should experiment with a range of loss rates in mininet. You can go from 1% to 10% (effective overall loss rate of 35%) at most. If you set higher loss rates, you will need larger files to allow your RTT estimate to converge to the network RTT.
Autogenerating Test Files: You can test for longer files using the generate_test_files.py script.

Tips

START EARLY! The earlier you start, the sooner you can ask questions if you get stuck. Test your code in small increments. It’s much easier to localize a bug when you’ve only changed a few lines.
You may find the textbook to be more useful for this lab than it has been previously. It has good descriptions of the various reliability mechanisms that you might want to adopt.
Because you have root account access on your VM, you will have the necessary permissions to run Wireshark. You may find that to be useful while debugging. It won’t be quite as nice as when we used it to look at DNS, since it knows how to decode DNS and it knows nothing about your protocol, but you can still use it to look at the raw values, if necessary.

Submitting

Please remove any debugging output prior to submitting.

Please do not submit output file(s) that you used in testing.

To submit your code, simply commit your changes locally using git add and git commit. Then run git push while in your lab directory.