CS 45 — Lab 1: Shell Redux
Checkpoint: Friday, January 31 (3-minute demos during lab)
Due: Thursday, February 6 @ 11:59 PM
1. Overview
For this lab, we’ll be revisiting your CS 31 shell assignment to add some new features like I/O redirection, pipes, and reading from the PATH environment variable. Rather than providing starter code, you’ll start from a previous implementation that meets the CS 31 requirements (e.g., your old submission).
1.1. Goals
-
Implement inter-process communication using pipes and manipulate process file descriptor resources.
-
Read from and make use of environment variables.
-
Work with a variety of system calls and adopt best practices for their use.
-
Practice/refresh C programming skills, particularly memory management.
1.2. Handy References
2. Requirements
2.1. Parser
Similar to CS 31, you should implement a command line parser as a separate
library that your shell will #include
. The parser should build an ARGV array
of tokens and identify any non-token elements of a command line. Note that you
WILL need a more complex parser than you used for CS 31 to identify commands
that are separated by |'s and to record I/O redirections without returning them
as ARGV tokens.
For example, if the user inputs:
ls -l 2> error.txt | sort &
Your parser should identify that there are two commands, separated by a pipe, which should be run in the background:
-
ARGV array: [
"ls"
,"-l"
,NULL
]. Meaning: redirect standard error to fileerror.txt
, pipe standard out to command #2. -
ARGV array: [
"sort"
,NULL
]. Meaning: pipe standard in from command #1.
Note that the 2> error.txt
|
and &
portions of the command are instructions to the shell and are NOT command ARGV tokens.
To simplify parsing, you may assume that nothing other than white space (spaces, newlines, etc.) will appear after an ampersand (&), there will be at most one ampersand, and that ampersands will not appear for any reason other than to specify background tasks.
2.2. Command Execution
Your shell should be able to execute simple commands in the foreground and
background by reading from the PATH
environment variable and using
execv()
, which expects a
full path to the program in its first argument. If the user’s command starts
with /
, ./
, or ../
(e.g., "/bin/ls"
), you can pass it to execv
as-is.
Otherwise, (e.g., "ls"
) you need to search the PATH
environment variable to
find and construct the full path. A call to getenv("PATH")
will return to
you a string of directories separated by ':'
characters. You should look
through those directories, in order, until you find a matching program name.
You can use the access()
function to check if a path contains an executable program (e.g.,
access("/bin/ls", X_OK)
).
You should NOT modify the string result of |
2.3. Built-in Commands: cd and exit
If the user executes "cd path"
, your shell should change the working
directory to the specified path using
chdir()
. You can execute
the pwd
command to print the current working directory (to test if cd
is
working). Your shell should also terminate if the user inputs exit
.
2.4. Built-in Commands: history and !num
Your shell should keep a history of the most recent commands the user has
entered using a circular queue (same as CS 31). If the user enters a command
of "history"
, you should print the stored history of commands, each preceded
by a number. If the user enters a command of !num
(where num is an integer),
it should re-execute the command that corresponds to that number.
Note that if the user inputs a > ls Makefile cs45shell cs45shell.c parsecmd.c parsecmd.h parsecmd.o sleeper sleeper.c tester tester.c > history 0 ls 1 history > !0 Makefile cs45shell cs45shell.c parsecmd.c parsecmd.h parsecmd.o sleeper sleeper.c tester tester.c > history 0 ls 1 history 2 ls <-- This is NOT "!0" 3 history |
You may assume that built-in commands (history, !num, exit, and cd) will NOT
appear in the same command as a pipe or an I/O redirect. That is, despite real
shells handing commands like "history | grep …"
, your shell will not be asked
to mix built-in commands with those features.
If the user enters a command that uses I/O redirection or pipes, the entire command should be saved in the history (the same as it was typed).
2.5. Piping
Your shell should support piping the output (standard out) from one command to the input (standard in) of another. A pipe is an inter-process communication (IPC) mechanism for two Unix processes running on the same machine. It’s a one-way communication channel between the two processes (one process always writes to one end of the pipe, the other process always reads from the other end of the pipe). The interface to a pipe is like a temporary file with two open ends — writes to one end by one process can be read from the other end by the another process. However, unlike a regular file, a read from a pipe results in the removal of data that was written by the other process.
The pipe()
system call will
create a pipe when given an array of two integers. It creates two file
descriptors, one for the read end of the pipe (array element 0), the other for
the write end of the pipe (array element 1).
Child processes inherit file descriptors from their parent, so you’ll want to
create a pipe before calling fork() so that any newly-created processes will
get a copy of the pipe when they’re forked. After the fork, a child can set up
its file descriptors before calling |
Placing a pipe between two commands causes the output of the first to feed into the input of the second. For example:
ls | sort
Says to take the output from ls and rather than printing it, instead send it as input to sort, which will sort it and then print it.
cat file.txt | grep blah
Says to take the output from cat (the contents of file.txt) and send it to the input of grep, which searches for all lines containing the string "blah". This is a silly way to run grep, since grep can read files on its own without cat, but it works well for testing pipes.
You are not required to support multiple pipes in a single command line, although I would encourage you to think about how you might make that happen (it’s not that different).
2.5.1. Creating a Pipe
Calling the pipe()
system call creates two new file descriptors for a
process: a read-end and a write-end. Any data written to the write-end will be
available for reading from the read-end. When all of the FDs representing
the write-end of a pipe are closed, any attempt to read from the pipe will
return end-of-file (EOF) to the reader, signifying that no more data is coming.
When working with pipes in your shell, it’s important to close any parts of the pipe that you aren’t using. For example, if a child process is planning to write into a pipe, it should close the read-end, since the read-end isn’t useful to it. Similarly, a pipe reader should close the write-end. Your parent shell will also need to close both ends after it has forked all of the child processes that need access to a shared pipe. If you don’t close all the write ends of a pipe, your shell will hang because the pipe’s reader will never get an EOF! |
2.6. I/O Redirection
Your shell should support I/O redirection whereby files replace a process’s
standard in (0), standard out (1), or standard error (2) file descriptor
streams. The user can ask to replace standard in by supplying "< filename"
.
For example:
grep blah < input_file.txt 1> output_file.txt
Says to use the file "input_file.txt" as file descriptor 0 rather than reading from terminal input and the file "output_file.txt" as file descriptor 1 rather than printing to standard out.
For output streams, the user must specify which I/O stream(s) to replace:
-
1> filename
redirects the standard out stream to a file. -
2> filename
redirects the standard error stream to a file.
The user may also choose to redirect both streams, either to separate files or
to the same one. To implement this functionality, most of the complexity is in
the parser, since you’ll need to recognize that an I/O redirect is happening
and treat the next token as the file name rather than an ARGV token. After
parsing is complete and you’ve created a child process, you can use
open()
and
dup2()
to place the
file at the appropriate file descriptor.
When opening files for writing, pay attention to the flags and modes you pass
to |
Note that most shells will assume you mean "1>" if you just say ">", but you are not required to do that. For output streams, you may assume there will be an explicit 1 or 2 preceding the ">". You may also assume that the user won’t give you nonsense numbers (e.g., 3>).
2.7. Error Reporting
Your shell should report errors using
perror()
and continue
executing whenever possible. If the main shell process (parent) makes a system
call that fails (e.g., signal, fork, pipe) it’s fine to report the error and
terminate, since you can’t really recover from those. For minor errors though
(e.g., exec fails because the user’s command is invalid) a child process should
terminate, but the shell should continue.
Always always always check the return value of any system call you make!
2.8. Cleanup
Your shell should reap all child processes when they terminate, close all file
descriptors it’s no longer using, and get a clean bill of health from
valgrind
: no invalid memory accesses, uninitialized variables, or memory
leaks.
3. Checkpoint
For the checkpoint, you should have a running shell program that can execute
simple commands in the foreground and background and that implements the
built-in commands cd
and exit
. You also should have support for parsing:
-
Commands with I/O re-direction.
-
Commands with a pipe.
You will demo your checkpoint to me during your lab section on Friday, Feb 1. Prior to your demo, create a list of a few simple commands that you can use to demo your shell’s meeting the checkpoint requirements (you can just cut and paste commands from your test suite during your demo). You only get a few minutes for your demo, so please plan it before lab. I also encourage you to go beyond just the checkpoint requirements in the first week, but I only need to see the required checkpoint functionality during your short demo.
4. Examples & Testing
4.1. Parser
During the lab meeting, I suggested a struct that looks like:
struct cmd {
char **cmd1_argv;
char **cmd2_argv;
char *cmd1_fds[3];
char *cmd2_fds[3];
}
This struct is just a suggestion, and it resembles what I chose to do in my implementation. You’re welcome to add extra fields or change the way you record parsing information. Ultimately, what you need to know is:
-
How many commands are there: just one, or two commands separated by a pipe?
-
For each command, which file descriptors are we asking to redirect, and what are the names of the files we’re redirecting to/from?
-
For each command, what is the ARGV array? That is, the same thing your CS 31 shell’s parser was returning: an array whose first element is the name of the program to execute, the middle elements are the arguments to that program, and the last element is NULL to signify the end.
-
Should we be running in the foreground or background?
If you find it helpful, it would be reasonable for you to add a flag to your
struct that you can set to 0/1 depending on whether or not you find a pipe.
Alternatively, you can simply set cmd2_args
to NULL
if you don’t find a
pipe. It doesn’t really matter how you signal it as long as you know what
you’ve chosen so that your shell can check for it later. You may also prefer
to add a background flag to the struct rather than passing it in as a special
argument. It doesn’t matter to me, and you have some flexibility in the
design.
As far as the checkpoint goes, I’ll suggest something that will be useful for your debugging and for convincing me that your parser works correctly: add a function that prints a summary of the contents of your struct, whatever it happens to contain. I’ll give you some examples of what that might look like using the example struct above:
-
Plain command:
ls -l
cmd1_args: ["ls", "-l", NULL] /* three-element dynamically-allocated array */ cmd2_args: NULL /* There is no second command here. */ cmd1_fds[0]: NULL /* With no I/O redirects, all descriptors are NULL */ cmd1_fds[1]: NULL cmd1_fds[2]: NULL cmd2_fds[0]: NULL cmd2_fds[1]: NULL cmd2_fds[2]: NULL
-
Pipe only command:
ls | sort
cmd1_args: ["ls", NULL] /* two-element dynamically-allocated array containing */ cmd2_args: ["sort", NULL] /* two-element dynamically-allocated array containing */ cmd1_fds[0]: NULL /* With no I/O redirects, all descriptors are NULL */ cmd1_fds[1]: NULL cmd1_fds[2]: NULL cmd2_fds[0]: NULL cmd2_fds[1]: NULL cmd2_fds[2]: NULL
-
Redirects only:
ls -l 1> out.txt 2> error.txt
cmd1_args: ["ls", "-l", NULL] /* three-element dynamically-allocated array */ cmd2_args: NULL /* There is no second command here. */ cmd1_fds[0]: NULL cmd1_fds[1]: "out.txt" cmd1_fds[2]: "error.txt" cmd2_fds[0]: NULL /* There is no second command here. */ cmd2_fds[1]: NULL cmd2_fds[2]: NULL
-
Combined:
grep -i blah < input.txt | sort 1> output.txt
cmd1_args: ["grep", "-i", "blah", NULL] /* four-element dynamically-allocated array */ cmd2_args: ["sort", NULL] /* two-element dynamically-allocated array containing */ cmd1_fds[0]: "input.txt" cmd1_fds[1]: NULL cmd1_fds[2]: NULL cmd2_fds[0]: NULL cmd2_fds[1]: "output.txt" cmd2_fds[2]: NULL
Note: you can replace "blah" with the string you’d like to search for in the text.
You don’t need to worry about printing the "dynamically-allocated array" stuff, I just added that to make it more clear what those things are (ARGV arrays). If you can print your structs this way, it will make it MUCH easier for you to debug your parser and convince yourself/me that it’s working correctly.
4.2. Shell Testing
You may find it helpful to test with a program that writes to both stdout and stderr (already included in your repo). You can compile it with:
gcc -Wall -o standard_out_error standard_out_error.c
If you run it normally, (with no pipes or redirects), you’ll see that all of
the output goes to your terminal, regardless of which stream is specified in
fprintf
. For more interesting tests, you can do things like:
./standard_out_error 1> out.txt 2> err.txt
The above should give you two files, each containing only one of the output streams.
You can also use pipes too:
./standard_out_error 2> err.txt | grep 5
The above should only print standard out’s line 5 to the terminal (due to grep’s text search behavior), but all of standard error should be captured to the file.
Note that you can also ask for things that don’t really make sense:
./standard_out_error 1> out.txt 2> err.txt | grep 5
It’s not well-defined what should happen in the above case, since you’re giving two conflicting instructions about what to do with the standard out stream. This example is a user error and is not a good test case.
5. Tips & FAQ
-
Spend a little time thinking about the design requirements before you dive into coding. In particular, what sort of helper functions might you want to have available? In the parser, for example, I found it very helpful to have a function to get the start of the next token and the token’s length. Designing with abstract functions like that in mind and then implementing them as small modules later is typically MUCH easier than trying to build one massive chunk of code all at once.
-
Run
valgrind
as you go, rather than waiting until the end. It will help you identify problems sooner! -
If a system call fails, the
perror()
function will typically tell you why, in a nice, human-readable way. Take advantage of it, and don’t assume why a system call might be failing! -
In C, you can only return one value from a function, but you can pass in as many pointers as you want. You can use pointer arguments to simulate multiple return values by having a function dereference an input pointer to set its value for the caller.
-
Your parser will need to distinguish between command line tokens that belong in the ARGV array vs. meta characters that are telling your shell what to do (e.g., |, &, and I/O redirects). I would suggest dealing with those special meta characters (and their arguments in the case of I/O redirection) first. Then, once you’ve learned what you need to know from them, you can overwrite them to be blank spaces and use the old code you had from CS 31 to parse ARGV lists.
In the case of pipes, you might consider overwriting
'|'
with a null-terminator ('\0'
) character, so that it splits the string into two halves, with each representing one of the two commands on either side of the pipe. Then, you might find it easier to re-use your existing CS 31 parser on each sub-command.
-
When opening files for I/O redirection, the
flags
andmode
parameters are important. For input redirects, the file is read-only, so you can setflags
toO_RDONLY
and omitmode
. For output redirects, you’ll need to OR together several flags:O_WRONLY | O_CREAT | O_TRUNC
This collection says to open the file for writing only, create a new file if it doesn’t exist, and truncate it (overwrite) if it does exist. You’ll also need to set a
mode
when writing:S_IRUSR | S_IWUSR
Setting the mode that way will grant you (the owner of the file) read and write permissions. For more details, run
man 2 open
. -
If you end up with processes that aren’t going away, you can terminate them with the kill or pkill command. kill expects a process ID, whereas pkill searches for a process by program name. If need be, you can pass the -9 argument to send SIGKILL rather than the default SIGTERM.
-
If you’re adapting your CS 31 solution and you’re told that important functions or constants aren’t defined, make sure that you have all the necessary header files included. The CS 45 shell requires a few extra headers. The full list I used is:
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "parsecmd.h"
6. Submitting
Please remove any excessive debugging output prior to submitting.
To submit your code, commit your changes locally using git add
and git
commit
. Then run git push
while in your lab directory.