Lab 3: Adding a New Synchronization Primitive to Linux
Due: Friday Nov. 11 before 1am (late Thursday night)

Problem Introduction
Implementation Details
Getting Started
Testing your Code
What to Hand in and Demo

Introduction

For this project you will implement and test a new synchronization primitive called an Event. User-level processes will use Events in the following way:
  1. open an Event
  2. wait and signal on Events they have opened
  3. close an Event
Multiple processes can use Events to synchronize their actions by each opening the same Event and then calling event_wait and event_signal on it. Your solution must ensure that only processes that open an Event can call event_wait, event_signal or event_close the Event. The lifetime of an Event is the period of time from its first open (when it is created) to the time of its last close (when it is destroyed). Only the last close of an Event destroys it.

The semantics of event_wait are that is always blocks the calling process. The semantics of event_signal are that it unblocks all processes waiting on the Event.

You will implement the data structure(s) for representing Events at the kernel-level, and you will add system calls for user-level processes to event_open, event_close, event_wait, and event_signal on Events. In addition, you will need to write user-level programs to test your Event primitives.

An individual process can have multiple Events open at a time (and can wait and signal on any of its open Events). Your Event implementation, however, must make sure that only processes that open an Event can wait, signal, or close the Event.

I strongly encourage you to get an early start on this project; your solution will require a fair amount of thinking time, linux code reading time, and code design and debugging time.


Implementation Details:

Grab a copy of the starting point code from ~newhall/public/cs45/lab3/. The starting point code contains files with most of the #includes that you will need, and includes them in the correct order (there are some dependencies on include order of some some kernel header files, so be careful if you add more #includes ).

To implement support for Event primitives, you need to create kernel data structure(s) to keep track of Events that have been created (i.e. an Event table), to keep track of which processes currently have the Event open, to keep track of which processes are currently blocked on an Event, etc. Typically, the kernel uses fixed-sized data structures to keep track of instances of abstractions that it implements. For example, there is an open file table to keep track of all currently open files in the system. Because these data structures are fixed-sized, there is an upper bound on the number of instances that can currently exist in the system (e.g. there is a maximum number of open files). Implement your Event Table data structure so that it is fixed-sized, and make its size small (~5) so that you can test that you are handling the case when a process requests to open a new Event, but there are already 5 active Events (each has at least one open). When an Event is closed for the last time, it is no longer an active Event, and space for it in the Event Table can be used by a subsequent newly created Event.

Creating an Event Table

When the kernel boots it initialize all of its data structures. If you add a data structure to the kernel to keep track of Events, you need to have the kernel initialize it on startup. To do this add an event_init routine to the kernel, and then add a call to this routine in the kernel's initialization code that is in start_kernel() in init/main.c. Your initialization function should have the following prototype:
#include <linux/init.h>
void __init event_init();   	// this is NOT a system call 

Dynamic Memory Allocation in the Kernel

kmalloc and kfree are kernel-level routines for dynamic allocation of kernel memory. Use GFP_KERNEL as the priority value passed to kmalloc.

Memory leaks in the kernel are very, very bad; make sure that if you add a call kmalloc you add a corresponding call to kfree somewhere. Also, it is good programming practice to set a variable's value to NULL after freeing the space to which it points: free(ptr); ptr = 0; .

Event System call

You need to implement the following system calls for performing operations on Events (use the SYSCALL_DEVINEX macros to generate the function definitions though):
  1. long event_open(int *id): if *id==0, then create a new Event, and set the id to the new Event's ID. If *id is positive, open an existing Event with a matching id value. The function returns 0 on success, or an error value on failure. Return different error values to indicate different types of failure (you may use error values already defined in error.h, for example: return -ESRCH;). Remember, that you need to verify that values passed by reference to your system calls refer to readable/writable memory before you read/write to it (see project 2 if you forgot how to do this). A process can only open the same event 1 time.

  2. long event_close(int id): closes the corresponding Event. The last close to an Event, destroys the Event (you could re-use its id for a subsequent event_open call that is passed a value of 0). Only processes that have the Event open can close the Event. Also, the last close should fail if there are waiters on the Event. The function returns 0 on success, or an error value on error.

  3. long event_wait(int id): blocks the calling process on the Event with a matching id (again the caller must have opened the Event before it can wait or signal or close it). The function returns 0 on success, or an error value on error.

  4. long event_signal(int id): wakes up all processes blocked on the event with the matching id. The function returns 0 on success, or an error value on error. Again, only processes that have opened the Event can signal on it.

  5. long print_event_table(): a debugging routine to print out the contents of the event table (printk). For open events, include the process ids of every process that has the event open, and print out the pid of all processes waiting on each event. returns 0 on success.

In addition, you may want to implement a print_event_table function that can be called from within your system calls to help with debugging and testing.

Blocking and Unblocking a Process (task)

I want you to do the blocking and wake-up of processes waiting on Events in the following way (don't use other higher-level kernel functions that you may find):

You will use wait_queue_head_t data structure(s) to block processes that have done a wait on an Event.

To block a process on an Event, you need to:

  1. set the process' state (in its task_struct) to TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE (you should use TASK_INTERRUPTIBLE for this assignment, which means that if the process receives a signal (like SIGKILL), it will wake up before event_signal is called on the Event on which it is blocked.
  2. add the process to a wait queue associated with the Event
  3. call the kernel routine schedule() to de-schedule the process and schedule another process from the READY Q to run.

To wake-up a process on an Event, you need to:

  1. remove the process from its waitQ
  2. set its state to TASK_RUNNING
  3. call schedule() to add it to the readyQ
OR you can use the kernel function wake_up (in wait.h that calls function in sched.c) to do much of this for you, but make sure you understand what this function is doing.

Wait Queues

You should use the linux kernel wait queue data structures for keeping track of all processes waiting on an Event. Your code will need to include linux/wait.h and linux/sched.h. The wait queue data structure uses Linux's doubly linked list implementation (in linux/list.h), and consists of a wait_queue_head struct that contains a list of wait_queue_t structs (one for each task waiting on the wait queue).

Your solution should NOT make calls to the higher level functions interruptible_sleep_on, or sleep_on. Instead, you will implement a solution that explicitly adds a process to a waitqueue and blocks it using the 3 steps listed above. You may, however, use wake_up to wake up all waiters on a wait queue.

The wait queue data structure is a bit tricky to figure out. I suggest looking at existing code the initializes, adds and removes items from a wait queue, and draw some pictures and step through the code for init_waitqueue_head, init_waitqueue_item, add_wait_queue, and remove_wait_queue on your picture.

To iterate through the elements on a wait queue, you can use the list_for_each_safe macros, and the list_entry macro to get a pointer to the next wait_queue_t element in the list. Here are some example code fragments using wait_queues:

  // declare and init a wait Q
  static DECLARE_WAIT_QUEUE_HEAD(my_wait_Q);  

  // declare and initialize a wait_queue_t element named "wait_entry"
  struct task_struct *curtask = get_current();
  DECLARE_WAITQUEUE(wait_entry, curtask);   

  // there are functions to add and remove elements from a wait Q:
  add_wait_queue(&my_wait_Q, &wait_entry);
  remove_wait_queue(&my_wait_Q, &wait_entry);

  // you can traverse the elements of a wait queue using the underlying
  // list implementation used by the wait queue (see wait.h and list.h )
  struct list_head *tmp, *next;	
  list_for_each_safe(tmp, next, &(my_wait_Q.task_list)) {
      wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list);
      ...
  }

Synchronization Primitives in the kernel

You will likely need to add synchronization code to some or all of your system call implementations. Think about the fact that multiple processes can be simultaneously running in kernel mode in any of your system call functions. Are there data structures that you need to ensure atomic access to? If so, then around those parts that should be in a critical section. Look in linux/semaphore.h for the semaphore implementation. Here is some example code using semaphores:
 
    static DECLARE_MUTEX(event_mutex); // declare a MUTEX semaphore initialized to 1
    static DECLARE_MUTEX_LOCKED(event_mutex); // declare a MUTEX semaphore initialized to 0 

    if(down_interruptible(&event_mutex)) {
          // process woke up due to signal...semaphore was NOT acquired	
    }  
    // Critical Section Code (i.e. access shared data structures)
	  // make sure this process doesn't block while it holds the semaphore
    up(&event_mutex)) {

Other requirements

Your Event implementation should be in a new source file that you add to the kernel. You may need to add a new header file(s) as well.

Your system calls should return error values as they did in the previous assignment (for example, return -ESRCH). It is fine to use existing error values defined in error.h that are close to but perhaps not exactly what you want for your errors. You may add new error codes if you'd like (using existing ones is fine though). User level programs should call perror to print an error message when an error value that is returned by your system calls.


Getting Started

This project will require a lot of reading of Linux source code to get an idea of what you need to do and how to do it. Your final solution will likely be about 300-500 lines of code (including your well written comments). Start by examining Linux source code that does something similar to what you need to do (block a process on an Event and wake up all processes waiting on the Event when an signal on an Event occurs) I'd suggest looking at some of the source files code in ipc/ and in sched.c. In addition, you will want to look at implementations of some of the functions you will use in your solution such as the functions that add to, remove from, and init a wait_queue.

As you design your solution, think about what a process is (or can be) doing while another process is modifying shared state associated with an Event. For example: is it possible for process i to do a wait on Event j, process k to do a signal on Event j, and process i to do another wait on Event j before it has been removed from Event j's wait queue (add itself to the wait Q more than one time)? can a process exit while it is on an Event wait queue? when does a process' call to schedule() return (what is true about the process at this point)?

I suggest incrementally implementing and testing functionality. Start with event_init, then event_open and test that a process can successfully open several Events, and that you are correctly detecting error conditions. You will likely want to add a fair amount of debugging code as you implement and test. The output from printk statements are also written to files: /var/log/messages and /var/log/kern.log.

Use good modular design...remember you can have regular kernel-level functions that your system calls call.

You should build a new kernel package for your lab3 kernel image (use a new --append-to-version flag for this project). You could start with your lab 2 kernel and add in lab 3 functionality or start with a new copy of the kernel source and add lab 3 functionality to that.


Testing:

Implement a test application(s) that you can use to test all your system calls. Make sure to test all the different conditions of multiple and single processes synchronizing on Events (including handling of error conditions). Here are some cases to consider (this is not an exhaustive list): Keep in mind that your implementation of Event does not have to prevent user-level programs from using Events in a bad way. For example, if two processes open the same Event to synchronize their actions, and they both do an event_wait on the Event, they will both block indefinitely (deadlock). This is not a problem with your implementation, but a problem with the user-level program's use of Events. There may, of course, be hanging behavior that is caused by bugs in your implementation, and these you must fix. As you debug, it is important to be careful about what you are testing, so that if your program hangs you know whether it is due to a bug in your Event implementation or a bug in your test program.

When you call system calls from your test program, get and check the return value. If the return value is -1, you should call perror to print out an error specific message.

Again, use information in /proc to help you debug your program's correctness.


Hand in

Create a single tar file with the following and submit it using cs45handin:
  1. README file containing the following information:
    1. Your name and your partner's name
    2. The number of late days you used on this assignment
    3. The number of late days you have used so far
    4. A brief (1 paragraph) description of how you implemented the Event synchronization primitive. Include information about any design choices you made and why you made them.
    5. A list of the source file that you modified or added (and are submitting) please list these as full path names. For example:
       
             Files I added
             -------------
             /usr/src/linux/kernel/mynewfile.c  # implementation of my new sys calls
             ...	
      
             Files I modified
             ----------------
             /usr/src/linux/kernel/existingkernelfile.c # added a call to mynewfunc in function blah
             ...
      
    6. Describe how I can find your changes to code in kernel source files you submitted (i.e. what do I search for in these files to find your changes?). Put some comments before and after your changes:
             /*************** START my changes for project 3 ***************/
                your code
             /*************** END my changes for project 3 ***************/
        

  2. Copies of the test program(s) you wrote including Makefile(s). These should be well commented including descriptions the functionality that you are testing at different points in your program, and including a description of how to run your test program. You may additionally submit an output script of your test program annotated with comments describing what you are testing at each part (it is not necessary that you submit this, and only do so if you think that it helps to explain how your test program works).

  3. Copies of all source and header files you added to the kernel and copies of all kernel source and header files you modified (like unistd.h). Please do not submit the entire kernel directory containing your solution, and please do not submit .o files or other binaries.

Demo

After submitting your project, you and your partner will sign-up for a 30 minute demo slot. You should demonstrate all completely working functionality of your system calls as well as your handling of error conditions. I will definitely want to see cases where both single and multiple processes are waiting on an Event.

You should write an interactive test program that provides a menu of options, the user selects the next option, and then you can show the results of that action. This as opposed to writing a test program that contains all the calls you want to demo and just streams through them. The idea is to be able to stop after each system call and show me what happened as a result of that call and demonstrate that the call did what it is supposed to do.

Additionally, you may want to run a version of your kernel with debugging output that prints out the state of your Event data structures and prints out process state as wait, signal, open and close operations are executed on Events. This may help you to demonstrate that your solution is really blocking processes when it should be and really waking them up when it should be.