Lab 8 Goals:
- Demystify how a Unix shell program works by writing one, including the
following features:
- Parsing string command line arguments into an argv list in dynamic
memory.
- Executing commands that run in the foreground and the background.
- Executing previous commands by interacting with the shell history (using
the !num syntax.)
- Learn how to create and reap processes with the fork, execvp,
waitpid system calls.
- Interact with signals and write signal handler code.
- Gain more expertise with gdb and valgrind for debugging C programs.
Lab Description
You will implement a shell, which is the program you interact with on the
command line of a terminal window. A shell operates by doing the following
things:
- Print a prompt and wait for the user to type in a command.
- Read in the command string entered by the user.
- Parse the command line string into an argv list.
- If the command (first item in the parsed argv list) is a built-in shell
command, the shell will handle it on its own (without forking a child
process).
- Otherwise, if it's not a built-in command, fork a child process to execute
the command and wait for it to finish (unless the command is to be run in the
background, then the shell doesn't wait for the child to exit).
- Repeat until the user enters the built-in command exit to exit the
shell program.
Part 1: Command line parsing library.
For the first stage of your shell, you'll implement a parsecmd library that
contains functions to parse a command line string into its individual command
line arguments, and construct an argv list of strings from the command line
args. Your library functions can then be used by other programs by #including
your parsecmd.h file and linking in your parsecmd.o binary on
the gcc command line:
$ gcc -g -o tester tester.c parsemd.o
Your library will export one function, parse_cmd_dynamic, though
you're encouraged to create helper functions to aid your implementation of it.
Helper functions should not be exported via the parsecmd.h header
file.
The parse_cmd_dynamic function will take in a command line string
and turn it into an argv list (an array of strings, one per command line
argument) that's suitable for passing to a newly-starting program. The
function will also test for the presence of an ampersand (&), which
indicates that the command should be run in the background. For example, if
the user enters the command line string:
$ cat foo.tex
The string "cat foo.tex" will be passed to parse_cmd_dynamic,
which will then produce an argv array that looks like:
argv [0] ---->"cat"
argv [1] ---->"foo.tex"
argv [2] ----| (NULL)
Note that there will be a newline character after the foo.tex and before the
final null-terminator character as a result of the user hitting the [enter]
key.
This operation is known as tokenizing the string into individual
tokens, each of which is separated by white space. Given a command line string
as input, your implementation of the parse_cmd_dynamic function should
dynamically allocate and return the argv array of strings, one for each command
line token. It takes the following parameters:
/*
* parse_cmd_dynamic - Parse the passed command line into an argv array.
*
* cmdline: The command line string entered at the shell prompt
* (const means that this function cannot modify cmdline).
* Your code should NOT attempt to modify these characters!
*
* bg: A pointer to an integer, whose value your code should set.
* The value pointed to by bg should be 1 if the command is to be
* run in the background, otherwise set it to 0.
*
* returns: A dynamically allocated array of strings, each element
* stores a string corresponding to a command line argument.
* (Note: the caller is responsible for freeing the returned
* argv list, not your parse_cmd_dynamic function).
*/
char **parse_cmd_dynamic(const char *cmdline, int *bg);
To produce a dynamic argv array, your implementation must first determine
how many tokens are in the command line string. After that, it should malloc
EXACTLY the right number of (char *) argv buckets, one for each of the tokens
plus one extra bucket at the end for NULL, which indicates the end of the
tokens. Then, for each token, it should malloc exactly enough space to store
the string corresponding to the that token (don't forget one extra byte for the
null-terminator '\0' character).
For example, if the command line string is:
" ls -l -a "
The function should allocate space for an array of four character pointers
(char *'s). The first should have three bytes (characters) allocated to it for
storing 'l', 's', and '\0'. The second should allocate three bytes to store
"-l\0", and the third should hold "-a\0". The final pointer should be
NULL.
// Declare a local var to store dynamically allocated args array of strings.
char **args;
// After allocating memory and tokenizing, it should look like:
args --------->[0]-----> "ls"
[1]-----> "-l"
[2]-----> "-a"
[3]-----| (NULL)
To help test parse_cmd_dynamic's behavior and convince yourself of
its correctness, you'll write a few test cases in the provided tester.c
file.
Requirements
Tips
- Test your code in small increments. It's much easier to localize a bug
when you've only changed a few lines.
- In implementing parse_cmd_dynamic, you will likely want to
make more than one pass through the characters in the input string.
- When working with pointers, and in particular double pointers, it's often
very useful to draw some pictures about what's pointing to what, noting the
types of variables. Don't shy away from using the whiteboard or scratch
paper!
Part 2: Building a shell.
Now that we can tokenize command line strings, let's put together the rest
of the pieces for executing user commands. Your shell should support the
following features:
-
Running commands in the foreground.
When a command is run in the foreground, for example:
cs31shell> ./sleeper 2
Your shell program should fork() a child process to execute
sleeper and then wait until the child process exits before proceeding.
You can accomplish this by calling waitpid in the parent (your shell)
by passing in the pid of the child process (the return value of
fork()).
-
Running commands in the background.
When a command is run in the background, for example:
cs31shell> ./sleeper 3 &
Your shell program should fork() a child process to execute
sleeper, but it should NOT wait for the child to exit. Instead, after
forking the child process, it should immediately return to step 1 (print out
the prompt and read in the next command line). The child process will execute
the command concurrently while the parent shell handles other command(s).
Your shell must still reap background processes after they exit, so you
can't just forget about them! When a child that was run in the background
exits, your shell program will receive a SIGCHLD signal. You should install a
SIGCHLD handler that will call waitpid() to reap the exited child
process(es).
Your shell should be able to run any number of processes in the background,
so if you type in quick succession:
cs31shell> ./sleeper &
cs31shell> ./sleeper &
cs31shell> ./sleeper &
cs31shell> ps
The ps program output should list all three sleeper child processes.
- Built-in commands.
Your shell should respond to the following three built-in commands on its
own. It should not fork off a child to handle these!
- exit: Terminate the shell program. You can print out a goodbye
message, if you'd like.
- history: Print a list of the user's 10 more recently entered
command lines. (Note: blank lines should not be added to the history.)
- !num (where num is an actual number, e.g., !5): Re-execute a
previous command from the history. The previous command could be a
run-in-the-foreground, run-in-the-background, or a built-in command that your
shell should execute appropriately. The command line should be added into
the history list (i.e. executing !5 should not put !5 in the history
list, instead a copy of the command line associated with command ID 5
from the history list should be added to the history list). See the sample
output below for some examples of history and !num commands.
- History.
Your shell program should keep a list of the 10 most recently entered
command lines by the user (use a #define constant named MAXHIST whose value is
10). The built-in history command should print out the history list in
order from first (oldest) to last (most recently entered) command. For each
element in the list, it should print the command ID and its corresponding
command line string. The command ID is an ever-increasing number, and each
command should increment the command ID by one. I suggest using an unsigned
int to store this value (don't worry, I'm not going to try executing 4 billion
commands in an attempt to overflow this value).
Your history list should be implemented as circular queue of history
structs: a circular array of MAXHIST buckets where new commands are added
to one end and old ones removed from the other end.
Users can request the execution of a previous command, as stored in the
history list, by executing the built-in !num, where "num" is a number
corresponding to a history command ID. Upon receiving such a command:
- Search your history list for a command with a matching command ID.
Remember that the command ID is not the position in the history list, it is
the unique number of the command in your shell's execution history (i.e. !5
is the 5th command run by your shell, !34 is the 34th command run by your
shell).
- If a matching command ID is not found, print out an error message.
- Otherwise, use the command line from the matching history command ID, and
re-execute it (this command now also becomes the most recent command to add
to your history list).
If the command from the history list is not a built-in command, then your
shell should run it just like it does any foreground or background program
(parse its command line into argv list, fork-execvp and waitpid or not).
If the command from the history list is a built-in command (which could
only be the history command), then it should execute it directly just
like any built-in command.
Requirements
- Your shell should support running programs in the foreground (e.g.
ls -l)
- Your shell should support running programs in the background (e.g.
./sleeper &)
- Your shell should support the built-in command exit to
terminate.
- Your shell should support the built-in command history that keeps
a list of the MAXHIST most recently entered command lines entered by the user.
Use a constant definition for MAXHIST, and submit your solution with it set to
10, but try your shell out with other sizes too.
- Your shell should support running commands from the history using
!num syntax, where num is the command ID of a command from
your command history (e.g. !33 should execute the command with command
ID 33 from your history list). If a matching command num is not found the
current history list, then your shell should print out and error message
(command not found), otherwise the command line from the matching command on
the history list should be executed (again).
- Use the execvp version of exec for this assignment and
waitpid instead of wait. See the "Tips" section below for
examples.
- You need to add a signal handler for SIGCHLD signals so that you
can reap exited child processes that are run in the background. You should not
leave any long-term zombies!
- Whenever your code calls a library or system call function that returns a
value, you should have code that checks the return value and handles error
values. You can call exit for unrecoverable errors, but print
out an error message first (printf or perror for system call error return
values).
- The only global variables allowed are those associated with
the history list and its state. All other program variables should be
declared locally and passed to functions that use them.
- For full credit, your shell should use good modular design, be
well-commented and free of valgrind errors. The main function should probably
not be much longer than that in the starting point code. Think about breaking
your program's functionality into distinct functions.
Example Output
It may be helpful for you to take a look at Tia's sample
output, particularly to see how the built-in history features work.
Tips
- Implement and test incrementally (and run valgrind as you go). Here
is one suggestion for an order to implement:
- Add a call to your library from part 1 to parse the input line into argv
strings.
- Add support for the built-in command exit.
- Add support for running commands in the foreground (the parent
process, the shell, waits for the child pid that it forks off to exec
the command).
- Add support for running commands in the background (the parent
process, the shell, does NOT wait for the child pid that it forks off to run
the command). After forking off a child to run the command, the shell
program should go back to its main loop of printing out a prompt and
waiting for the user to enter the next command. You will need to add
a signal handler on SIGCHLD so that when the process that is running
in the background terminates, the shell reaps it. Use waitpid to
reap all child processes. Use the sleeper program to test:
cs31shell> ./sleeper &
cs31shell> ./sleeper 2 &
cs31shell> ps w
- Add support for the history list (implemented as a circular queue). The
operations on your circular queue are slightly different when the queue is
not yet full from when it is full and you need to replace the oldest entry
each time a new command is entered. Think about all state you will need to
keep to keep track of the first element in the list (for printing out), the
next insertion spot, and the end of the list.
- Add support for !num built-in command that will run command num from your
history list. num is a command ID, which is increasing as your shell runs
commands. It is NOT the bucket index into the history list.
- The maximum length of a command line is defined in parsecmd.h. You can use
the MAXLINE constant in your program.
- Use the functions execvp(), waitpid(), and signal() for executing
programs, waiting on children, and registering signal handlers. Note that the
first argument to waitpid is the process id (PID) of the process you'd
like to wait for, but if you pass in an argument of -1, it will wait for ANY
reapable child process. This is useful inside your SIGCHLD handler, where you
won't know which child (or children) exited. You can pass it WNOHANG as the
third parameter to prevent it from blocking when there are no children to
reap.
- Remember that if you dynamically allocate space for a string (using
malloc), you need to allocate a space at the end for the terminating null
character ('\0'), and that you need to explicitly free the space when you are
done using it (call free). Since your parsing library is allocating memory for
the argv list, it's up to your shell to free that memory when it's done with
it.
- You can call fflush(stdout); after any calls to printf to
ensure the printf output is written immediately to the terminal. Otherwise,
the C standard I/O library might buffer it for a short while.
- When in doubt about what your shell should do, try running the command in
the bash shell (a standard system terminal) and see what it does.
Submitting
Please remove any debugging output prior to submitting.
To submit your code, simply commit your changes locally using git
add and git commit. Then run git push while in your lab
directory. Only one partner needs to run the final push, but make sure both
partners have pulled and merged each others changes. See the section on Using a shared repo on the git help page.