This lab has two parts:
Command-line parser (part 1): Due by 11:59 pm, Thursday, November 16, 2023
Completed shell (parts 1 and 2): Due by 11:59 pm, Thursday, November 30, 2023
Your lab partner for Lab 8 is listed here: Lab 8 partners
Our guidelines for working with partners: working with partners, etiquette and expectations.
1. Lab Goals
-
Demystify how a Unix shell program works by writing one, including the following features:
-
Parsing string command line arguments into an
argv
list in dynamic memory. -
Executing commands that run in the foreground and the background.
-
Executing built-in commands
exit
,cd
, andhistory
-
-
Learn how to create and reap processes with the
fork
,execvp
,waitpid
system calls. -
Interact with signals and write signal handler code.
-
Gain more expertise with
gdb
andvalgrind
for debugging C programs.
2. Lab Overview
You will implement a shell, which is the program you interact with on the command line of a terminal window. A shell operates by doing the following things:
-
Print a prompt and wait for the user to type in a command.
-
Read in the command string entered by the user.
-
Parse the command line string into an
argv
list. -
If the command (first item in the parsed
argv
list) is a built-in shell command, the shell will handle it on its own (without forking a child process). -
Otherwise (if it’s not a built-in command) fork a child process to execute the command.
-
If the command was run in the foreground, the shell waits for the child process to finish.
-
If the command was run in the background, the shell does not wait .
-
-
Repeat starting at step 1 until the user enters the built-in command
exit
to exit the shell program.
3. Useful manual pages
4. Part 1: Command-line parsing library
For the first stage of your shell, you’ll implement a parsecmd
library that contains functions to parse a command line string into its individual command line arguments and construct an argv
list of strings from the command line arguments. Your library functions can then be used by other programs by using #include
for your parsecmd.h
file and linking in your parsecmd.o
binary on the gcc
command line:
$ gcc -g -o tester tester.c parsecmd.o
(This is handled for you by the provided Makefile
.)
Your library will export one function, parse_cmd_dynamic
, though you’re encouraged to create helper functions to aid your implementation of it. Helper functions should not be exported via the parsecmd.h
header file.
The prototype for the parse_cmd_dynamic
function is shown below:
char **parse_cmd_dynamic(const char *cmdline, int *bg);
From the prototype, we can see that the parse_cmd_dynamic
function takes two parameters:
-
a string representing the command line, and
-
a pointer to an
int
related to running commands in the background, which we’ll describe shortly.
The parse_cmd_dynamic
function should create an array with the same format as the argv
array passed into main
: an array of strings, one per command line argument, with a NULL
pointer as the last element of the array. As such, this return value is suitable for passing to a newly-starting program.
The function will also test for the presence of an ampersand (&
), which indicates that the command should be run in the background. It should modify the int
pointed to by bg
to be 1 if the ampersand is present (the command is to be run in the background), and it should be 0 if the ampersand is not present (the command should run in the foreground).
Whether the ampersand is present or not, you must set the value that |
4.1. Example
For example, if the user enters the command line string:
cat foo.txt
The string "cat foo.txt\n"
will be passed to parse_cmd_dynamic
, which will then produce an argv
array that looks like:
argv[0] ----> "cat" argv[1] ----> "foo.txt" argv[2] ----| (NULL)
Notice that there is a newline character ('\n'
) after the foo.txt
and before the final null-terminator character as a result of the user hitting the [enter] key. You don’t want to store the newline character in your argv
array.
4.2. Details
Building the argv
array is known as tokenizing the string into individual tokens, each of which is separated by whitespace. Given a command line string as input, your implementation of the parse_cmd_dynamic
function should dynamically allocate and return the argv
array of strings, one for each command line token. It takes the following parameters:
/*
* parse_cmd_dynamic - Parse the passed command line into an `argv` array.
*
* cmdline: The command line string entered at the shell prompt
* (const means that this function cannot modify cmdline).
* Your code should NOT attempt to modify these characters!
*
* bg: A pointer to an integer, whose value your code should set.
* The value pointed to by bg should be 1 if the command is to be
* run in the background, otherwise set it to 0.
*
* returns: A dynamically allocated array of strings, each element
* stores a string corresponding to a command line argument.
* (Note: the caller is responsible for freeing the returned
* `argv` list, not your parse_cmd_dynamic function).
*
* note: If the user does not type anything except whitespace,
* or if they only enter whitespace and an ampersand(&), this
* function should return a pointer to an array with one element: NULL.
*/
char **parse_cmd_dynamic(const char *cmdline, int *bg);
To produce a dynamic argv
array, your implementation must:
-
Determine how many tokens are in the command line string.
-
After that, it should
malloc
EXACTLY the right number of(char *) argv
buckets, one for each of the tokens plus one extra bucket at the end forNULL
, which indicates the end of the tokens. -
Then, for each token, it should
malloc
exactly enough space to store the string corresponding to that token (don’t forget one extra byte for the null-terminator\0
character).
The isspace
function (in the ctype.h
library) can help you identify whitespace characters (spaces, tabs, newlines, etc.) in the input string. Use the man isspace
command for more information on isspace
.
4.3. Tokenization example
For example, if parameter cmdline
is set as follows:
" ls -l /usr/bin \n" (quotation marks not part of the input string)
The function should allocate space for an array of four character pointers (char *
's):
-
The first pointer should point to three bytes (characters) allocated to it (using
malloc
) for storingl
,s
, and\0
. -
The second should point to memory (3 bytes) allocated to store
-l\0
-
The third should point to memory allocated to store
/usr/bin\0
. -
The final pointer should be
NULL
.
You should not store the '\n'
.
// Declare a local var to store dynamically allocated argv array of strings. char **argv; // After allocating memory and tokenizing, it should look like: argv ------>[0]-----> "ls" # make sure that each of the strings [1]-----> "-l" # is null-terminated with \0 [2]-----> "/usr/bin" # and [3]-----| (NULL) # that you store NULL as the last element
To help test parse_cmd_dynamic
's behavior and convince yourself of its correctness, you’ll use
the code in the tester.c
file (which you will compile and run as ./tester
). Note that although the
tester
code should work as-is, you need to add code to free
all of the memory you allocated. See the TODO
in
the tester.c
file.
4.4. Requirements
-
Your implementation of
parse_cmd_dynamic
should behave according to the specifications described above. -
You may not change the prototype (argument types or return type) of
parse_cmd_dynamic
, and your solution should NOT use thestrtok
function from thestring.h
library. -
You may assume that if an ampersand (
&
) appears in the command line, indicating that the command is to be run in the background, that it will appear at the end of the line. That means you can ignore any/all characters that appear after an ampersand (&
) except the'\0'
character.-
For this lab, an ampersand is never considered to be a token or part of a token; it’s just a way for the user to express that the program should be run in the background.
-
Ampersands should NOT appear in the
argv
array thatparse_cmd_dynamic
produces.
-
-
Your implementation should make no attempt to modify the input command line argument string. It should be treated as a constant. You ARE allowed to make a copy of the string (e.g. with
strdup
) and modify that however you’d like. -
Your function should work for command line strings entered with any amount of whitespace between command line tokens (assuming there’s at least one whitespace character between them). For example, all of the following inputs should yield identical
argv
lists returned by your function. (Recall the'&'
character does not end up in yourargv
array.)cat foo.txt blah.txt cat foo.txt blah.txt & cat foo.txt blah.txt& cat foo.txt blah.txt
You need to TEST that your code works for command lines with any amount of whitespace between arguments!
-
You should fill in the missing
TODO
intester.c
such that, if we run your tester, it works correctly and there are novalgrind
issues (like memory leaks, since there are no calls tofree
in the starter code). -
For full credit, your solution should be well-commented, it should not use global variables, it should demonstrate good design, and it should be free of
valgrind
errors.
4.5. Tips
-
Test your code in small increments. It’s much easier to localize a bug when you’ve only changed a few lines.
-
In implementing
parse_cmd_dynamic
, you will likely want to make more than one pass through the characters in the input string. -
When working with pointers, and in particular double pointers, it’s often very useful to draw some pictures about what’s pointing to what, noting the types of variables. Don’t shy away from using the whiteboard or scratch paper!
-
It’s fine to make multiple passes through the
cmdline
input string. For example, you may want to initially move through the string once to count how many tokens there will be. Then, you have the information you need to allocate theargv
array, and you can make a second pass through the string to copy the contents of each token. -
While parsing the string, there are some common operations that you can outsource to helper function. Doing so will make parsing the string much easier, as you can test the helper function individually. For example, you might want functions for things like:
-
count the number of tokens
-
find the start of the next token in the string
-
copy a token
-
4.6. Submitting Part 1
Please remove any debugging output prior to submitting!
$ git add tester.c parsecmd.h parsecmd.c
$ git commit -m "Lab 8 part 1 completed"
$ git push
5. Part 2: Building a shell
If you haven’t yet, be sure to read through the Week 10 in-lab exercises before beginning on Part 2.
Now that we can tokenize command line strings, let’s put together the rest of the pieces for executing user commands. Your shell should be able to:
-
Run executables in the foreground
-
Run executables in the background
-
Support the built-in commands
exit
,history
, andcd>
Do not begin coding right away! Read through this section of the lab writeup in its entirety first. There are lots of hints in the "Tips" section, so be sure to consider those before beginning to code.
5.1. Running commands in the foreground
When a command is run in the foreground, for example:
cs31shell> ./sleeper 2
Your shell program should fork()
a child process. The child process will execute sleeper
(using execvp
). The parent process, your shell, will wait until the child process exits before proceeding. You can accomplish this by calling waitpid
in the parent process and passing in the pid
of the child process, which you got back as the return value of fork()
.
5.2. Running commands in the background
When a command is run in the background, for example:
cs31shell> ./sleeper 3 &
Your shell program should fork()
a child process. The child process will execute sleeper
(using execvp
). But the parent process, your shell, should not wait for the child to exit.
Instead, after forking the child process, it should immediately return to execution step 1 of its main control flow (print a prompt and wait for the user to type in a command). The
child process will execute the command concurrently while the parent shell
handles other command(s).
Your shell must still reap background child processes after they exit (reap its
zombie children), so you can’t just forget about them! When a child that was
run in the background exits, your shell program will receive a SIGCHLD
signal. You should install a SIGCHLD
handler that will call waitpid()
to
reap the exited child process(es).
Look at the lecture slides and at
13.4.1 Signals of the textbook for how to reap background processes by registering
a signal handler on |
Your shell should be able to run any number of processes in the background, so if you type in quick succession:
cs31shell> ./sleeper &
cs31shell> ./sleeper &
cs31shell> ./sleeper &
cs31shell> ps
The ps
program output should list all three sleeper
child processes.
5.3. Built-in commands
Your shell should respond to the following two built-in commands on its own. It should not fork off a child to handle these!
-
exit
: Terminate the shell program. You can print out a goodbye message, if you’d like. -
cd <path>
: Change to directory specified by<path>
. -
cd
: Without a<path>
argument, the built-incd
should change back to your home directory. -
history
: Display the most recently executed commands, in the order they were executed.
For these built-in commands, as long as the first argument matches the built-in
command, you should attempt to execute the built-in command. If there are extraneous arguments,
you should print an error message. For example, if the user types exit now
, you should print a
suitable error message and not exit.
5.3.1. Using cd
when given a path
If the user types cd <path>
, your shell should not fork a new process. Instead, you should use the C function chdir()
to change the working directory to the specified <path>
. Read the chdir
man page to familiarize yourself with how that function works and be sure you handle the return value of chdir
in the case where it does not succeed (e.g., because the user typed an invalid path).
Assuming you can execute commands in the foreground, you can execute the pwd
command to print the current working directory which will let you test if cd
is working. The following example assumes
your username is bkernig1
:
cs31shell> pwd
/home/bkernig1
cs31shell> cd cs31
cs31shell> pwd
/home/bkernig1/cs31
cs31shell> cd weeklylab/week10
cs31shell> pwd
/home/bkernig1/cs31/weeklylab/week10
cs31shell> cd baddir
cd: Cannot change directory
5.3.2. Using cd
without a path (switch to home directory)
There’s another feature of cd
that you may or may not be aware of. You can test it out in the real shell first so you can see how it works, if you’d like.
-
If you just type
cd
, with no<path>
following,cd
will change to your home directory. For example, if your username isbkernig1
, then typingcd
will change your directory to be/home/bkernig1/
.
cs31shell> pwd
/home/bkernig1
cs31shell> cd cs31
cs31shell> pwd
/home/bkernig1/cs31
cs31shell> cd
cs31shell> pwd
/home/bkernig1
To implement changing to your home directory when typing cd
, you need to know
where your home directory is. The path of your home directory is stored in an
environment variable called HOME
. C provides the getenv()
function that
allows you to read the value of environment variables. Familiarize yourself
with the getenv
function from the
getenv
man page. Here is
a small piece of C code demonstrating the usage of getenv
:
// This is an example, not a recommendation for what you want
// in your code. Adapt this example for use in your program.
char *path = getenv("HOME");
if (path != NULL) {
printf("Your home directory is %s\n", path);
}
5.3.3. Displaying recent commands with history
Your shell program should keep a list of the MAXHIST
most recently entered
command lines by the user. The starter code for cs31shell.c
contains the line
#define MAXHIST 10
, so you should use MAXHIST
, not 10, when referring to
the size of the history.
The built-in history
command should print out the history list in order from
first (oldest) to last (most recently entered) command. For each element in the
list, it should print the command ID and its corresponding command line string.
The command ID is an ever-increasing number, and each command should increment
the command ID by one. Use an unsigned int to store this value (don’t worry
about executing 4 billion commands in an attempt to overflow this value).
Your history list should be implemented as circular queue of history structs: a
circular array of MAXHIST
buckets where new commands are added to one end and
old ones removed from the other end.
5.4. Requirements
-
Your shell should support running programs in the foreground (e.g.
ls -l
) -
Your shell should support running programs in the background (e.g.
./sleeper &
) -
Your shell should support the built-in command
exit
to terminate. -
Your shell should support the built-in command
cd
to change directories with the standardcd <path>
usage and with thecd
usage to change to your home directory. -
Use the
execvp
version of exec for this assignment andwaitpid
instead ofwait
. See the "Tips" section below for examples. -
You need to add a signal handler for
SIGCHLD
signals so that you can reap exited child processes that are run in the background. You should not leave any long-term zombies! -
Whenever your code calls a library or system call function that returns a value, you should have code that checks the return value and handles error values. You can call
exit
for unrecoverable errors, but print out an error message first (printf
orperror
for system call error return values). -
The only global variables allowed are those associated with the history list and its state. All other program variables should be declared locally and passed to functions that use them.
-
Your shell should support the built-in command
history
that keeps a list of theMAXHIST
most recently entered command lines entered by the user. Use a constant definition forMAXHIST
, and submit your solution with it set to 10, but try your shell out with other sizes too. -
For full credit, your shell should use good modular design, be well-commented and free of
valgrind
errors. The main function should probably not be much longer than that in the starting point code. Think about breaking your program’s functionality into distinct functions.
5.5. Sample Output
The following sample output shows how the shell should generally work.
Additional sample output for history demonstrates the history command’s behavior.
5.6. Tips
The headline feature of the lab is the ability of your shell to run executables in the foreground and background. However, implementing cd
and history
can also be challenging.
Implement and test incrementally (and run valgrind
as you go). Here is one suggestion for an order to implement:
-
Add a call to your
parse_cmd_dynamic
function from part 1 to parse the input line intoargv
strings. -
Handle the built-in
exit
command and be sure to free all dynamically allocated memory. -
For all other commands, print out a message (e.g.
Running in the foreground: ls -l
, orRunning in the background: ./sleeper 5 &
). Do not attempt to actuallyfork
off a new process to run the command yet. -
Test your program, including
exit
. You should be able to get your program to produce the following sample starting output. -
Continue to use
valgrind
to be sure that there are no memory errors or leaks! -
Only after you have implemented the steps above, tested them thoroughly, and verified there are no memory errors, should you go on to the next steps.
-
Add support for running commands in the foreground (the parent process, the shell, waits for the child
pid
that it forks off toexecvp
the command). -
Add support for running commands in the background (the parent process, the shell, does NOT wait for the child
pid
that it forks off to run the command). After forking off a child to run the command, the shell program should go back to its main loop of printing out a prompt and waiting for the user to enter the next command.You will need to add a signal handler on
SIGCHLD
so that when the process that is running in the background terminates, the shell reaps it. Usewaitpid
to reap all child processes. Use the sleeper program to testcs31shell> ./sleeper & cs31shell> ./sleeper 2 & cs31shell> ps w
-
Handle the built-in
cd
command. Start by implementing thecd <path>
variant. Then add functionality for thecd
feature that returns you to your home directory. -
Add support for tracking history in a circular queue and print the numbered queue contents, in order, when executing the built-in
history
command. The history list is implemented as a circular queue. The operations on your circular queue are slightly different when the queue is not yet full (when your shell starts up) versus when it is full and you need to replace the oldest entry each time a new command is entered. Think about all state you will need to keep to keep track of the first element in the list (for printing out), the next insertion spot, and the end of the list.
Here are other useful Tips:
-
The maximum length of a command line is defined in
parsecmd.h
. You can use theMAXLINE
constant in your program. -
Use the functions
execvp()
,waitpid()
, andsignal()
for executing programs, waiting on children, and registering signal handlers. Note that the first argument towaitpid
is the process id (pid
) of the process you’d like to wait for, but if you pass in an argument of-1
, it will wait for ANY reapable child process. This is useful inside yourSIGCHLD
handler, where you won’t know which child (or children) exited. You can pass itWNOHANG
as the third parameter to prevent it from blocking when there are no children to reap. -
You can call
fflush(stdout);
after any calls toprintf
to ensure theprintf
output is written immediately to the terminal. Otherwise, the C standard I/O library might buffer it for a short while. -
Remember that if you dynamically allocate space for a string (using
malloc
), you need to allocate a space at the end for the terminating null character (\0
), and that you need to explicitlyfree
the space when you are done using it. Since your parsing library is allocating memory for theargv
list, it’s up to your shell to free that memory when it’s done with it. -
Remember that you can compare individual char values directly, but you need to use
strcmp
(orstrncmp
) to compare two string values:char str[20]; strcpy(str, "hello"); if(str[2] == 'x') { printf("x is in position 2\n"); } if(strcmp(str, "yo ho ho") == 0) { printf("strings are equal\n"); }
-
You’ll need to clean up your zombie children. A correct shell program should reap all exited children and not leave around zombie children after it exits. However, in its development stages, you may find that it creates zombie processes that it never reaps. You should check for, and kill, all zombies after running your shell program.
To do this, after running your shell program, run
ps
inbash
to see if you have left any un-reaped zombie children (or any extra instances of your shell program still running):$ ps 233 pts/7 00:00:01 sleeper <zombie>
If so, send a
kill
signal to get the zombie to die, using eitherkill -9
orpkill -9
:$ kill -9 233 # kills the process with pid 233 $ pkill -9 sleeper # kills all sleeper processes
-
There is some starter code for building a circular queue of integers in the in-lab exercises. You should attempt to implement that first. Once you get that working you can go ahead and implement your circular queue of strings in your shell program. If you get stuck on the circular queue part, start with a simpler version, where the first history list element is always in bucket 0 and the last in bucket MAXHIST-1, and then shift values over as you add a new command to the last bucket each time. Then, come back to this later and try to turn this into it a circular queue (which, for large MAXHIST values, is much more efficient than having to shift values over to maintain the queue ordering).
-
When in doubt about what your shell should do, try running the command in the bash shell (a standard system terminal) and see what it does. This is a good way to test that your implementation of the builtin
cd
function is working properly.
6. Submitting
Please remove any debugging output prior to submitting!
$ git add tester.c parsecmd.h parsecmd.c cs31shell.c
$ git commit -m "Lab 8 completed"
$ git push
7. Handy Resources
-
EdSTEM for questions and answers about lab assignment
-
C Debugging: gdb Guide, Valgrind Guide, Chapter 3 of the textbook
-
C Programming: C programming resources, C code style guide
-
Week 10 in-lab examples: using a C library, signals and signal handlers, circular arrays
-
Textbook Chapters: 2.9.5: Using C libraries, 13.4.1 Signals
-
Misc: Some Useful Unix Commands, vi (and vim) quick reference, make and makefiles man and Manual pages