CS 31 Lab 8

1. Due Date

Command-line parser (part 1): Due by 11:59 pm, Tuesday, Nov. 16, 2021

Completed shell (parts 1 and 2): Due by 11:59 pm, Tuesday, Nov. 23, 2021

Your lab partner for Lab 8 is listed here: Lab 8 partners

Our guidelines for working with partners: working with partners, etiquette and expectations

2. Lab Goals:

Demystify how a Unix shell program works by writing one, including the following features:
1. Parsing string command line arguments into an argv list in dynamic memory.
2. Executing commands that run in the foreground and the background.
3. Executing previous commands by interacting with the shell history (using the !num syntax.)
Learn how to create and reap processes with the fork, execvp, waitpid system calls.
Interact with signals and write signal handler code.
Gain more expertise with gdb and valgrind for debugging C programs.

3. Lab Overview

You will implement a shell, which is the program you interact with on the command line of a terminal window. A shell operates by doing the following things:

Print a prompt and wait for the user to type in a command.
Read in the command string entered by the user.
Parse the command line string into an argv list.
If the command (first item in the parsed argv list) is a built-in shell command, the shell will handle it on its own (without forking a child process).
Otherwise, if it’s not a built-in command, fork a child process to execute the command and wait for it to finish (unless the command is to be run in the background, then the shell doesn’t wait for the child to exit).
Repeat until the user enters the built-in command exit to exit the shell program.

4. Part 1: Command line parsing library.

For the first stage of your shell, you’ll implement a parsecmd library that contains functions to parse a command line string into its individual command line arguments, and construct an argv list of strings from the command line args. Your library functions can then be used by other programs by using #include for your parsecmd.h file and linking in your parsecmd.o binary on the gcc command line:

$ gcc -g -o tester tester.c parsemd.o

(This is handled for you by the provided Makefile.)

Your library will export one function, parse_cmd_dynamic, though you’re encouraged to create helper functions to aid your implementation of it. Helper functions should not be exported via the parsecmd.h header file.

The parse_cmd_dynamic function will take in a command line string and turn it into an argv list (an array of strings, one per command line argument) that’s suitable for passing to a newly-starting program. The function will also test for the presence of an ampersand (&), which indicates that the command should be run in the background. For example, if the user enters the command line string:

cat foo.tex

The string cat foo.tex will be passed to parse_cmd_dynamic, which will then produce an argv array that looks like:

argv [0] ---->"cat"
argv [1] ---->"foo.tex"
argv [2] ----|  (NULL)

Note that there will be a newline character after the foo.tex and before the final null-terminator character as a result of the user hitting the [enter] key. You don’t want to store the newline character in your argv array.

This operation is known as tokenizing the string into individual tokens, each of which is separated by white space. Given a command line string as input, your implementation of the parse_cmd_dynamic function should dynamically allocate and return the argv array of strings, one for each command line token. It takes the following parameters:

/*
 * parse_cmd_dynamic - Parse the passed command line into an `argv` array.
 *
 *    cmdline: The command line string entered at the shell prompt
 *             (const means that this function cannot modify cmdline).
 *             Your code should NOT attempt to modify these characters!
 *
 *         bg: A pointer to an integer, whose value your code should set.
 *             The value pointed to by bg should be 1 if the command is to be
 *             run in the background, otherwise set it to 0.
 *
 *    returns: A dynamically allocated array of strings, each element
 *             stores a string corresponding to a command line argument.
 *             (Note: the caller is responsible for freeing the returned
 *             `argv` list, not your parse_cmd_dynamic function).
 */
char **parse_cmd_dynamic(const char *cmdline, int *bg);

To produce a dynamic argv array, your implementation must first determine how many tokens are in the command line string. After that, it should malloc EXACTLY the right number of (char *) argv buckets, one for each of the tokens plus one extra bucket at the end for NULL, which indicates the end of the tokens. Then, for each token, it should malloc exactly enough space to store the string corresponding to the that token (don’t forget one extra byte for the null-terminator \0 character).

For example, if parameter cmdline is set as follows:

"    ls   -l   /usr/bin   "   (quotation marks not part of the string)

The function should allocate space for an array of four character pointers (char * 's). The first should have three bytes (characters) allocated to it for storing l, s, and \0. The second should allocate three bytes to store -l\0, and the third should hold /usr/bin\0. The final pointer should be NULL.

// Declare a local var to store dynamically allocated args array of strings.
char **args;

// After allocating memory and tokenizing, it should look like:

args --------->[0]-----> "ls"           # don't forget the \0
               [1]-----> "-l"           # at the end of each
               [2]-----> "/usr/bin"     # of these strings
               [3]-----|  (NULL)

To help test parse_cmd_dynamic 's behavior and convince yourself of its correctness, you’ll write a few test cases as input to your executable ./tester file.

4.1. Requirements

Your implementation of parse_cmd_dynamic should behave according to the specifications described above.
You may not change the prototype (argument types or return type) of parse_cmd_dynamic, and your solution should NOT use the strtok function from the string.h library.
You may assume that if an ampersand (&) appears in the command line, indicating that the command is to be run in the background, that it will appear at the end of the line. That means you can ignore any/all characters that appear after an & (except the '\0' character).
- For this lab, an ampersand is never considered to be a token or part of a token; it’s just a way for the user to express that the program should be run in the background.
- Ampersands should NOT appear in the argv array that parse_cmd_dynamic produces.
Your implementation should make no attempt to modify the input command line argument string. It should be treated as a constant. You ARE allowed to make a copy of the string (e.g. with strdup) and modify that however you’d like.
Your function should work for command line strings entered with any amount of whitespace between command line tokens (assuming there’s at least one whitespace character between them). For example, these inputs should yield identical argv lists returned by your function. (Recall the '&' character does not end up in your argv array.)
```
cat foo.txt  blah.txt
cat foo.txt  blah.txt      &
cat foo.txt  blah.txt&
             cat          foo.txt           blah.txt
```
You need to TEST that your code works for command lines with any amount of whitespace between arguments!
You should fill in the missing TODO in tester.c such that, if we run your tester, it works correctly and there are no valgrind issues (like memory leaks, since there are no calls to free in the starter code).
For full credit, your solution should be well-commented, it should not use global variables, it should demonstrate good design, and it should be free of valgrind errors.

4.2. Tips

Test your code in small increments. It’s much easier to localize a bug when you’ve only changed a few lines.
In implementing parse_cmd_dynamic, you will likely want to make more than one pass through the characters in the input string.
When working with pointers, and in particular double pointers, it’s often very useful to draw some pictures about what’s pointing to what, noting the types of variables. Don’t shy away from using the whiteboard or scratch paper!

4.3. Submitting Part 1

Please remove any debugging output prior to submitting!

$ git add tester.c parsecmd.h parsecmd.c
$ git commit -m "Lab 8 part 1 completed"
$ git push

5. Part 2: Building a shell

If you haven’t yet, be sure to read through the Week 11 in-lab exercises before beginning on Part 2.

Now that we can tokenize command line strings, let’s put together the rest of the pieces for executing user commands. Your shell should be able to:

Run executables in the foreground
Run executables in the background
Support the built-in commands exit and history
Recall previous commands using the !num syntax

Do not begin coding right away! Read through this section of the lab writeup in its entirety first. There are lots of hints in the "Tips" section, so be sure to consider those before beginning to code.

5.1. Specifications

5.1.1. Built-in commands

Your shell should respond to the following three built-in commands on its own. It should not fork off a child to handle these!

exit: Terminate the shell program. You can print out a goodbye message, if you’d like.
history: Print a list of the user’s MAXHIST most recently entered command lines. (Note: blank lines should not be added to the history, but all other command lines should.)
!num (where num is an actual number, e.g., !5): Re-execute a previous command from the history.
- The previous command could be a run-in-the-foreground, run-in-the-background, or a built-in command that your shell should execute appropriately.
- The command line retrieved from the history list should be added to the history list. That is, executing !5 should not put !5 in the history list; instead, a copy of the command line associated with command ID 5 from the history list should be added to the history list. See the sample output for some examples of history and !num commands.

For each of the three built-in commands, as long as the first argument matches the built-in command, you should run the built-in command. You do not need to check if there are extraneous arguments after the built-in command. For example, exit now will trigger the exit built-in command, and history -a will trigger the history built-in command.

5.1.2. Implementation details: `history` and `!num`

Your shell program should keep a list of the MAXHIST most recently entered command lines by the user. The starter code for cs31shell.c contains the line #define MAXHIST 10: use MAXHIST in your code, not 10, when referring to the size of the history.

The built-in history command should print out the history list in order from first (oldest) to last (most recently entered) command. For each element in the list, it should print the command ID and its corresponding command line string. The command ID is an ever-increasing number, and each command should increment the command ID by one. Use an unsigned int to store this value (don’t worry about executing 4 billion commands in an attempt to overflow this value).

Your history list should be implemented as circular queue of history structs: a circular array of MAXHIST buckets where new commands are added to one end and old ones removed from the other end.

Users can request the execution of a previous command, as stored in the history list, by executing the built-in !num, where "num" is a number corresponding to a history command ID. Upon receiving such a command:

Search your history list for a command with a matching command ID. Remember that the command ID is not the position in the history list, it is the unique number of the command in your shell’s execution history (i.e. !5 is the 5th command run by your shell, !34 is the 34th command run by your shell).
If a matching command ID is not found, print out an error message.
Otherwise, use the command line from the matching history command ID, and re-execute it (this command now also becomes the most recent command to add to your history list).

If the command from the history list is not a built-in command, then your shell should run it just like it does any foreground or background program: parse its command line into argv list, fork/execvp and determine whether you should waitpid or not.

If the command from the history list is a built-in command (which could only be the history command given our specification), then it should execute it directly just like any built-in command.

5.1.3. Running commands in the foreground

When a command is run in the foreground, for example:

cs31shell> ./sleeper 2

Your shell program should fork() a child process to execute sleeper and then wait until the child process exits before proceeding. You can accomplish this by calling waitpid in the parent (your shell) by passing in the pid of the child process (the return value of fork()).

5.1.4. Running commands in the background

When a command is run in the background, for example:

cs31shell> ./sleeper 3 &

Your shell program should fork() a child process to execute sleeper, but it should NOT wait for the child to exit. Instead, after forking the child process, it should immediately return to execution step 1 of its main control flow (print out the prompt and read in the next command line). The child process will execute the command concurrently while the parent shell handles other command(s).

Your shell must still reap background child processes after they exit (reap its zombie children), so you can’t just forget about them! When a child that was run in the background exits, your shell program will receive a SIGCHLD signal. You should install a SIGCHLD handler that will call waitpid() to reap the exited child process(es).

Look at the lecture slides and at 13.4.1 Signals of the textbook for how to reap background processes by registering a signal handler on SIGCHLD. The signals example from Week 11 in-lab exercises may also be helpful.

Your shell should be able to run any number of processes in the background, so if you type in quick succession:

cs31shell> ./sleeper &
cs31shell> ./sleeper &
cs31shell> ./sleeper &
cs31shell> ps

The ps program output should list all three sleeper child processes.

5.2. Requirements

Your shell should support running programs in the foreground (e.g. ls -l)
Your shell should support running programs in the background (e.g. ./sleeper &)
Your shell should support the built-in command exit to terminate.
Your shell should support the built-in command history that keeps a list of the MAXHIST most recently entered command lines entered by the user. Use a constant definition for MAXHIST, and submit your solution with it set to 10, but try your shell out with other sizes too.
Your shell should support running commands from the history using !num syntax, where num is the command ID of a command from your command history (e.g. !33 should execute the command with command ID 33 from your history list). If a matching command num is not found the current history list, then your shell should print out and error message (command not found), otherwise the command line from the matching command on the history list should be executed (again).
Use the execvp version of exec for this assignment and waitpid instead of wait. See the "Tips" section below for examples.
You need to add a signal handler for SIGCHLD signals so that you can reap exited child processes that are run in the background. You should not leave any long-term zombies!
Whenever your code calls a library or system call function that returns a value, you should have code that checks the return value and handles error values.` You can call exit for unrecoverable errors, but print out an error message first (printf or perror for system call error return values).
The only global variables allowed are those associated with the history list and its state. All other program variables should be declared locally and passed to functions that use them.
For full credit, your shell should use good modular design, be well-commented and free of valgrind errors. The main function should probably not be much longer than that in the starting point code. Think about breaking your program’s functionality into distinct functions.

5.3. Sample Output

The following sample output shows how the shell and built-in history features should work.

5.4. Tips

The headline feature of the lab is the ability of your shell to run executables in the foreground and background. However, by far, the hardest part of the lab is implementing the code necessary to recall previous commands.

Implement and test incrementally (and run valgrind as you go). Here is one suggestion for an order to implement:

Add a call to your parse_cmd_dynamic function from part 1 to parse the input line into argv strings.
Handle the built-in exit command and be sure to free all dynamically allocated memory.
Handle the built-in history command and be sure to add history to your history. The history list is implemented as a circular queue. The operations on your circular queue are slightly different when the queue is not yet full (when your shell starts up) versus when it is full and you need to replace the oldest entry each time a new command is entered. Think about all state you will need to keep to keep track of the first element in the list (for printing out), the next insertion spot, and the end of the list.
Handle the !num syntax to recall previous commands. It will be challeging to test this thoroughly at this point since the only command that can be in your history at this point is history. Recall that num is a command ID, which is increasing as your shell runs commands: it is NOT the bucket index into the history list.
For all other commands, print out a message (e.g. Running in the foreground: ls -l, or Running in the background: ./sleeper 5 &) and add the command to your history. Do not attempt to actually fork off a new process to run the command yet.
Thoroughly test your exit, history and !num commands. You should be able to get your program to produce the following sample starting output.
Continue to use valgrind to be sure that there are no memory leaks!
Only after you have implemented the steps above, tested them thoroughly, and verified there are no memory leaks, should you go on to the next steps.
Add support for running commands in the foreground (the parent process, the shell, waits for the child pid that it forks off to execvp the command).
Add support for running commands in the background (the parent process, the shell, does NOT wait for the child pid that it forks off to run the command). After forking off a child to run the command, the shell program should go back to its main loop of printing out a prompt and waiting for the user to enter the next command.

You will need to add a signal handler on SIGCHLD so that when the process that is running in the background terminates, the shell reaps it. Use waitpid to reap all child processes. Use the sleeper program to test
```
cs31shell> ./sleeper &
cs31shell> ./sleeper 2 &
cs31shell> ps w
```

Here are other useful Tips:

The maximum length of a command line is defined in parsecmd.h. You can use the MAXLINE constant in your program.
Use the functions execvp(), waitpid(), and signal() for executing programs, waiting on children, and registering signal handlers. Note that the first argument to waitpid is the process id (pid) of the process you’d like to wait for, but if you pass in an argument of -1, it will wait for ANY reapable child process. This is useful inside your SIGCHLD handler, where you won’t know which child (or children) exited. You can pass it WNOHANG as the third parameter to prevent it from blocking when there are no children to reap.
You can call fflush(stdout); after any calls to printf to ensure the printf output is written immediately to the terminal. Otherwise, the C standard I/O library might buffer it for a short while.
Remember that if you dynamically allocate space for a string (using malloc), you need to allocate a space at the end for the terminating null character (\0), and that you need to explicitly free the space when you are done using it. Since your parsing library is allocating memory for the argv list, it’s up to your shell to free that memory when it’s done with it.
There is some starter code for building a circular queue of integers in the in-lab exercises. You should attempt to implement that first. Once you get that working you can go ahead and implement your circular queue of strings in your shell program. If you get stuck on the circular queue part, start with a simpler version, where the first history list element is always in bucket 0 and the last in bucket MAXHIST-1, and then shift values over as you add a new command to the last bucket each time. Then, come back to this later and try to turn this into it a circular queue (which, for large MAXHIST values, is much more efficient than having to shift values over to maintain the queue ordering).
For executing !num commands, you can use the parsecmd library to create the argv list from the command string in the history. If you have used good modular design so far, you should be able to just make calls to existing functions to execute the command.
You know how to define and use structs in C, and you know how to use strings in C. See past weekly lab code, your lab solutions, and the C documentation at the bottom of the lab writeup.

Remember that you can compare individual char values directly, but you need to use strcmp to compare two string values:

  char str[20];
  strcpy(str, "hello");
  if(str[2] == 'x') {
    printf("x in 2\n");
  }
  if(strcmp(str, "yo ho ho") == 0) {
    printf("str seems like a pirate\n");
  }

You’ll need to clean up your zombie children. A correct shell program should reap all exited children and not leave around zombie children after it exits. However, in its development stages, you may find that it creates zombie processes that it never reaps. You should check for, and kill, all zombies after running your shell program.

To do this, after running your shell program, run ps in bash to see if you have left any un-reaped zombie children (or any extra instances of your shell program still running):
```
$ ps
  233  pts/7   00:00:01  sleeper <zombie>
```
If so, send a kill signal to get the zombie to die, using either kill -9 or pkill -9:
```
$ kill -9 233       # kills the process with pid 233
$ pkill -9 sleeper  # kills all sleeper processes
```
When in doubt about what your shell should do, try running the command in the bash shell (a standard system terminal) and see what it does. For example, here is a way to see how bash executes the history built-in command: example of history and !num in bash.

6. Submitting