CS45 Lab 3: Implementing a Synchronization Primative

Due Thursday March 6 before 11:59pm
Checkpoint due Monday Feb. 24 at the begining of Lab

For the Checkpoint Implement:

  1. Two of the required system calls, event_open and event_print. For the checkpoint version of event_print, print out Event table information about opens (you can add printing out information about waiters later, after the checkpoint).
  2. and the event_init function.
You will demo your checkpoint to me during your Monday lab session. You cannot use late days on the checkpoint.

Content:

This lab will be done with your forever CS45 lab partner: Lab partners and machine assignments

Introduction
For this project you will implement and test a new synchronization primitive called an Event. User-level processes will use Events in the following way:
  1. open an Event
  2. wait and signal on Events they have opened
  3. close an Event
Multiple processes can use Events to synchronize their actions by each opening the same Event and then calling event_wait and event_signal on it. Your solution must ensure that only processes that open an Event can call event_wait, event_signal or event_close on the Event. The lifetime of an Event is the period of time from its first open (when it is created) to the time of its last close (when it is destroyed). Only the last close of an Event destroys it.

The semantics of event_wait are that is always blocks the calling process. The semantics of event_signal are that it unblocks all processes waiting on the Event.

You will implement the data structure(s) for representing Events at the kernel-level, and you will add system calls for user-level processes to event_open, event_close, event_wait, and event_signal on Events. In addition, you will need to write user-level programs to test your Event primitives.

An individual process can have multiple Events open at a time (and can wait and signal on any of its open Events). Your Event implementation, however, must make sure that only processes that open an Event can wait, signal, or close the Event.

I strongly encourage you to get an early start on this project; your solution will require a fair amount of thinking time, linux code reading time, and code design and debugging time.

Starting point code and Virtualbox image
In this week's lab you will grab a new virtualbox image to use for lab3 and start with a new copy of the Linux source with a new config file to build the kernel in a different way.

Start by setting up a git repo for your lab3 work. Make sure to add three users to your git repo: you, your partner, and your shared CS45 account.

Both you and your partner should then clone a copy into your private cs45/labs subdirectory. here are the instructions for setting up a git repo from lab1.

Then one of you or your partner can copy over some starting point code into your repo, add and push it. I included an example gitignore file with the starting point code. First move it to .gitignore before adding and pushing it (and add in any names of executable files or other files you do not want git to add to your repo).

$ cp ~newhall/public/cs45/lab03/* .
$ ls
  Makefile README event.c gitignore tester.c
$ mv gitignore .gitignore
$ git add .gitignore
$ git add *
$ git commit
$ git push origin master
The starting point code contains files with most of the #includes that you will need, and includes them in the correct order (there are some dependencies on include order of some some kernel header files, so be careful if you add more #includes ). It also includes a Makefile and starting point for a user-level test program (mostly some guesses at #includes are given to you).

Implementation Details

Creating an Event Table

To implement support for Event primitives, you need to create kernel data structure(s) to keep track of Events that have been created (i.e. an Event table), to keep track of which processes currently have the Event open, to keep track of which processes are currently blocked on an Event, etc. Typically, the kernel uses fixed-sized data structures to keep track of instances of abstractions that it implements. For example, there is an open file table to keep track of all currently open files in the system. Because these data structures are fixed-sized, there is an upper bound on the number of instances that can currently exist in the system (e.g. there is a maximum number of open files, number of processes and threads, ...).

Implement your Event Table data structure as a static global variable of a fixed-sized, and make its size small (~5) so that you can test that you are handling the case when a process requests to open a new Event, but there are already 5 active Events (each has at least one open). Calls to event_open in this case should return an error indicating that no more Events can be created. When an Event is closed for the last time, it is no longer an active Event, and space for it in the Event Table can be used by a subsequent newly created Event.

Initializing the Event Table

When the kernel boots it initialize all of its data structures. If you add a data structure to the kernel to keep track of Events, you need to have the kernel initialize it on start-up. To do this implement an event_init function and then register it as an init function to be invoked on start-up, using the pure_initcall macro:
// NOTE: this is NOT a system call it is a function called by 
//       the kernel during its boot sequence to initialize state
//       you need for your event implementation  
static int  __init event_init() {
  // code to initialize Event data structures and other state
}

// to register the event_init function to be called during boot sequence:
pure_initcall(event_init);

Dynamic Memory Allocation in the Kernel

Although the Event Table itself should be statically allocated, there may be some Event state that needs to be dynamically allocated due to it depending on the set of processes currently using an Event.

kmalloc and kfree are kernel-level routines for dynamic allocation of kernel memory. Use GFP_KERNEL as the priority value passed to kmalloc.

Memory leaks in the kernel are very, very bad; make sure that if you add a call kmalloc you add a corresponding call to kfree somewhere. Also, it is good programming practice to set a variable's value to NULL after freeing the space to which it points: free(ptr); ptr = 0; .

Event System call

You will implement the following system calls for performing operations on Events (use the SYSCALL_DEVINEX macros to generate the function definitions though):
  1. long event_open(int *id): if *id==0, then create a new Event, and set the id to the new Event's ID. If *id is positive, open an existing Event with a matching id value. The function returns 0 on success, or an error value on failure. Return different error values to indicate different types of failure (you may use error values already defined in error.h, for example: return -ESRCH;). Remember, that you need to verify that values passed by reference to your system calls refer to readable/writable memory before you read/write to it (see project 2 if you forgot how to do this). A process can only open the same event 1 time.

  2. long event_close(int id): closes the corresponding Event. The last close to an Event, destroys the Event (you could re-use its id for a subsequent event_open call that is passed a value of 0). Only processes that have the Event open can close the Event. Also, the last close should fail if there are waiters on the Event. The function returns 0 on success, or an error value on error.

  3. long event_wait(int id): blocks the calling process on the Event with a matching id (again the caller must have opened the Event before it can wait or signal or close it). The function returns 0 on success, or an error value on error.

  4. long event_signal(int id): wakes up all processes blocked on the event with the matching id. The function returns 0 on success, or an error value on error. Again, only processes that have opened the Event can signal on it.

  5. long print_event_table(): a debugging routine to print out the contents of the event table (printk). For open events, include the process ids of every process that has the event open, and print out the pid of all processes waiting on each event. returns 0 on success.

Blocking and Unblocking a Process (task)

You are required to implement blocking processes on Events and waking-up process waiting on Events in the following way (don't use other higher-level kernel functions that you may find):

You will use wait_queue_head_t data structure(s) to block processes that call event_wait on an Event.

To block a process on an Event, you need to:

  1. set the process' state (in its task_struct) to TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE (you should use TASK_INTERRUPTIBLE for this assignment, which means that if the process receives a signal (like SIGKILL), it will wake up before event_signal is called on the Event on which it is blocked.
  2. add the process to a wait queue associated with the Event
  3. call the kernel routine schedule() to de-schedule the process and schedule another process from the READY Q to run.

To wake-up a process on an Event, you need to:

  1. remove the process from its waitQ
  2. set its state so that it will get put back on the readyQ (see code in wait.h that does this)
  3. do a bunch of other stuff
  4. call schedule() to add it to the readyQ
OR you may use the kernel's wake_up functions (in wait.h that calls function in sched.c) to do much of this for you, but make sure you understand what this function is doing...just use the wake_up functions don't try to write your own.

Wait Queues

You should use the linux kernel wait queue data structures for keeping track of all processes waiting on an Event. Your code will need to include linux/wait.h and linux/sched.h. The wait queue data structure uses Linux's doubly linked list implementation (in linux/list.h), and consists of a wait_queue_head struct that contains a list of wait_queue_t structs (one for each task waiting on the wait queue).

Your solution should NOT make calls to the higher level functions interruptible_sleep_on, or sleep_on. Instead, you will implement a solution that explicitly adds a process to a waitqueue and blocks it using the 3 steps listed above. You may, however, use wake_up to wake up all waiters on a wait queue.

The wait queue data structure is a bit tricky to figure out. I suggest looking at existing code the initializes, adds and removes items from a wait queue, and draw some pictures and step through the code for init_waitqueue_head, init_waitqueue_item, add_wait_queue, and remove_wait_queue on your picture.

To iterate through the elements on a wait queue, you can use the list_for_each_safe macros, and the list_entry macro to get a pointer to the next wait_queue_t element in the list. Here are some example code fragments using wait_queues (note: there are other ways to declare and initialize wait queues, and how you do so depends on the context in which you are using them, this is just one example):

  // declare and init a wait Q
  static DECLARE_WAIT_QUEUE_HEAD(my_wait_Q);  

  // declare and initialize a wait_queue_t element named "wait_entry"
  struct task_struct *curtask = get_current();
  DECLARE_WAITQUEUE(wait_entry, curtask);   

  // there are functions to add and remove elements from a wait Q:
  add_wait_queue(&my_wait_Q, &wait_entry);
  remove_wait_queue(&my_wait_Q, &wait_entry);

  // you can traverse the elements of a wait queue using the underlying
  // list implementation used by the wait queue (see wait.h and list.h )
  struct list_head *tmp, *next;	
  list_for_each_safe(tmp, next, &(my_wait_Q.task_list)) {
      wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list);
      ...
  }
The best way to figure out how to use wait queues is to look at kernel code that uses them (and look for different types of uses), and look through the wait.h file for macros, type defs, and functions you can use. As a rule of thumb, avoid using anything starting with an underscore.

Synchronization Primitives in the kernel

You will likely need to add synchronization code to some or all of your system call implementations. Think about the fact that multiple processes can be simultaneously running in kernel mode in any of your system call functions. Are there Event data structures or state to which you need to ensure atomic access? If so, then those parts should be in a critical section. Look in linux/semaphore.h for the semaphore implementation. Here is some example code using semaphores:
 
    static DECLARE_MUTEX(event_mutex); // declare a MUTEX semaphore initialized to 1
    static DECLARE_MUTEX_LOCKED(event_mutex); // declare a MUTEX semaphore initialized to 0 

    if(down_interruptible(&event_mutex)) {
          // process woke up due to signal...semaphore was NOT acquired	
    }  
    // Critical Section Code (i.e. access shared data structures)
	  // make sure this process doesn't block while it holds the semaphore
    up(&event_mutex)) {
You may not, however, use semaphores as the blocking and unblocking machanism for blocking and ublocking processes that call event_wait and event_signal.

Other requirements

Your Event implementation should be in a new source file that you add to the kernel.

Your system calls should return error values as they did in the previous assignment (for example, return -ESRCH). It is fine to use existing error values defined in error.h that are close to but perhaps not exactly what you want for your errors. You may add new error codes if you'd like, but it is not required. User level programs should call perror to print out an error message whenever a call to one of your system calls returns an error.

Tips for Getting Started
This project will require a lot of reading of Linux source code to get an idea of what you need to do and how to do it. Your solution will likely be about 300-500 lines of code (including your well written comments). Start by examining Linux source code that does something similar to what you need to do (block a process on an Event and wake up all processes waiting on the Event when an signal on an Event occurs) I'd suggest looking at some of the source files code in ipc/ and in sched.c. In addition, you will want to look at implementations of some of the functions you will use in your solution such as the functions that add to, remove from, and init a wait_queue.

As you design your solution, think about what a process is (or can be) doing while another process is modifying shared state associated with an Event. For example: is it possible for process i to do a wait on Event j, process k to do a signal on Event j, and process i to do another wait on Event j before it has been removed from Event j's wait queue (add itself to the wait Q more than one time)? can a process exit while it is on an Event wait queue? when does a process' call to schedule() return (what is true about the process at this point)?

I suggest incrementally implementing and testing functionality. Start with event_init, then event_open and test that a process can successfully open several Events, and that you are correctly detecting error conditions. As you go, implement partally functionality of event_print to print out information about the Event functionality you have implemented.

You will likely want to add a fair amount of debugging code as you implement and test. The output from printk statements is written to files: /var/log/messages and /var/log/kern.log. It can also be echoed to the console by sudo dmesg -n 7

Use good modular design...remember you can have regular kernel-level functions that your system calls call.

Testing
Implement a test application(s) that you can use to test all your system calls. Make sure to test all the different conditions of multiple and single processes synchronizing on Events (including handling of error conditions). Here are some cases to consider (this is not an exhaustive list): Keep in mind that your implementation of Event does not have to prevent user-level programs from using Events in a bad way. For example, if two processes open the same Event to synchronize their actions, and they both do an event_wait on the Event, they will both block indefinitely (deadlock). This is not a problem with your implementation, but a problem with the user-level program's use of Events. There may, of course, be hanging behavior that is caused by bugs in your implementation, and these you must fix. As you debug, it is important to be careful about what you are testing, so that if your program hangs you know whether it is due to a bug in your Event implementation or a bug in your test program.

When you call system calls from your test program, get and check the return value. If the return value is -1, you should call perror to print out an error specific message.

Again, use information in /proc to help you debug your program's correctness.

What to Handin
Submit the following using cs45handin Only one of you or your partner should submit your tar file via cs45handin. If you accidentally both submit it, send me email right away letting me know which of the two solutions I should keep and which I should discard.

Create a single tar file with the following and submit it using cs45handin:

  1. README file containing the following information:
    1. Your name and your partner's name
    2. The number of late days you used on this assignment
    3. The number of late days you have used so far
    4. A brief (1 paragraph) description of how you implemented the Event synchronization primitive. Include information about any design choices you made and why you made them.
    5. A list of the source file that you modified or added (and are submitting) please list these as full path names. For example:
       
       Files we added
       -------------- 
       # implementation of my new sys calls
       /usr/src/linux/kernel/mynewfile.c  
       ...	
       Files we modified
       ----------------- 
       # added a call to my function foo in function blah:
       /usr/src/linux/kernel/existinfile.c 
       ...
      
    6. The full path, and machine name, to the .deb packages with your lab3 kernel image and headers files. For example:
       @garlic:/local/me_and_pal/linux-headers-2.6.32.44-lab3_1.0_amd64.deb  
       @garlic:/local/me_and_pal/linux-image-2.6.32.44-lab3_1.0_amd64.deb  
      
      Immediately before, or after, running cs45handin, rebuild the kernel_image and kernel_header packages (the Option 1 build) so that they contain your latest lab3 updates (in case you have been doing a bzImage build). Then DO NOT MODIFY AFTER THE DUE DATE.

  2. Copies of the test program(s) you wrote including Makefile(s). These should be well commented including descriptions the functionality that you are testing at different points in your program, and including a description of how to run your test program. You may additionally submit an output script of your test program annotated with comments describing what you are testing at each part (it is not necessary that you submit this, and only do so if you think that it helps to explain how your test program works).

  3. Copies of every kernel source and header files that you modified or added for this lab. I only want the files you modified or added, do not submit a tar file containing the entire kernel source tree.


Demo and Preparing for Demo
After submitting your project, you and your partner will sign-up for a 20 minute demo slot. You should demonstrate all completely working functionality of your system calls as well as your handling of error conditions. I will definitely want to see cases where both single and multiple processes are waiting on an Event.

I recommend writing a simple, generic, interactive test program for your demo. It would provide a menu of system call options, read in the user's next option (and any additional input needed for that option), and then makes a call the user's selected system call. After each such selection, you can then show me the results of that particular system call by using information from /proc, ps, top, etc. or by calling print_event_table. This is as opposed to writing a test program that contains all the calls you want to demo and just streams through them. The idea is to be able to stop after each system call and show me what happened as a result of that call, demonstrating that the call did what it is supposed to do. And, then show me the results of another system call, and so on.

A simple test program with a menu of system call options that a user can select any one after another, can be used to show any sequence of system calls: it should be easy to write and extremely flexible for showing all kinds of different scenerios.

The print_event_table system call will be useful for showing the effects of the other system calls. However, you may also want to run a version of your kernel with some additional debugging printks if it helps you to demonstrate that your solution is really blocking processes when it should be and really waking them up when it should be or any other effects of operations on Events.

Links to Useful Resources