This lab should be done with your lab partner.
Lab 7 Goals:
- writing and using a library in C, .h and .c files
- more practice with pointers in C
- practice with C strings and char types, and with using the
string and ctype library functions
- more practice with dynamic memmory allocation
- gain expertise in gdb and valgrind for debugging C programs
Lab 8 Introduction
In this assignment you and your partner will implement a C library
(the .c and .h parts) and code to test the functions in your
library. You will be implementing the parsecmd library, one
function of which, you used in your shell lab program. Your
compiled library (
parsecmd.o), should be able to be
used in place of the one I gave you in lab 7, and linked into
your cs31shell executable file.
First, both you and your partner should run update31 to
grab some starting point code.
$ update31
$ cd cs31/labs/08
$ pwd
/home/your_user_name/cs31/labs/08
$ ls
Makefile parsecmd.c parsecmd.h tester.c
Starting Point Code:
- parsecmd.h: contains the header
file for your library (you likely will not need to add anything to
this file).
It contains the function prototype for two functions you
will implement:
// This is the function you used in the shell lab to convert
// the comandline string into an array of strings: one per command
// line argument. This version uses a fixed-size max length comdline
// and argv list.
int parse_cmd(const char *cmdline, char **argv);
// this is a slightly different version of a command line parsing
// function: it dynamically allocates space for the argv list of
// strings that it returns to the caller. The bg value is now
// passed-by-reference and this functions sets it to either 0 or 1
// depending on if the command line has a & in it or not
char **parse_cmd_dynamic(const char *cmdline, int *bg);
More details about each of these are described in the "Details and
Requirements" section below.
- parsecmd.c: contains the implementation of the parsecmd
library. The two library function definitions
and any internal type definitions and static globals that
you need should be added to this file. This file should not contain
a main function. In addition to the two library functions, add
static helper functions as needed
for good modular design.
- tester.c: a test program for your parsecmd library.
Add code here to test the functionality of your library. Make sure to
test all library functions. Again, add helper
functions to make this code manageable.
There is likely not a lot of code that you need to add to this file.
But, note the TODO comments in this file for some places
where you will need to add code and uncomment code to fully test your
library.
Project Details, Requirements and Hints
You will implement a parsecmd library that contains functions to
parse a command line string into its individual command line arguments,
and construct an argv list of strings from the command line args.
Your library functions can then be used by other programs by
#including your
parsecmd.h file and linking in
your
parsecmd.o binary on the gcc command line:
gcc -g -o tester tester.c parsemd.o
Information on building and using C libraries
Read over the "CREATING AND USING YOUR OWN LIBRARY CODE" section of the
following (this is also available off my C help pages):
Building and Using libraries in C. This gives an introduction
to writing .h files, implementing library code .c files, and compiling
and linking library code into C application code that uses the library.
For this assignment, you will build your library code as a single object file
(.o file). The Makefile provided with the starting point code
already does this for you.
Using the parsecmd library (ex. tester.c)
The parsecmd.h file contains the interface to your library. Applications
using the parsecmd library should #include it:
#include "parsecmd.h"
parsecmd.h contains two function prototypes for the functions you
will implement in parsecmd.c.
Implementing the parsecmd library (parsecmd.c)
Both functions in the parsecmd library take in a command line string
(like in the shell lab), and parse it into a argv list (an array
of strings one per command line argument). They both test for
an ampersand in the command line indicating a run in the background
command, and "return" a value indicating if the command should be
run in the background or not. For example, if the user enters the
follow command line string:
$ cat foo.tex
These functions will be passed the string:
"cat foo.tex\n"
And will parse the command line string into the argv array:
argv [0] ---->"cat"
argv [1] ---->"foo.tex"
argv [2] ----| (NULL)
The main difference between the two functions is that the first
uses a single statically declared char array into which will
be each each argv[i] string, and the second function dynamically
allocates space for both the argv array and for each string of
command line argument.
The parse_cmd function
/*
* parse_cmd - Parse the command line and build the argv array.
* cmdline: the command line string entered at the shell prompt
* (const means that the function will not modify the cmdline string)
* argv: an array of size MAXARGS of char *
* parse_cmd will initialize its contents from the passed
* cmdline string.
* returns: non-zero if the command line includes &, to
* run in the background, or zero if not
*/
int parse_cmd(const char *cmdline, char *argv[]);
This function will initialize the passed argv array to point into
substrings that it creates in a global char buffer (initialized to
a copy of the passed command line string). The buffer is already
declared as static global char array in parsecmd.c:
static char cmdline_copy[MAXLINE];
The parse_cmd function will:
- make a copy of the cmdline string in its copy buffer
- process its copy of the string to find tokens, modifying
the cmdline_copy buffer to create
substrings for each token.
A token is a sequence of non-white space chars, each separated
by at least one whitespace character (or by & ).
Tokens should not include &, which has special meaning in
command lines.
- assign each argv[i] bucket to point to its corresponding
substring token in the buffer. Remember that the a NULL value
in an argv[i] bucket is used to signify the end of the list
of argv strings.
For example, if the command line entered is the following
ls -1 -a &
The command line string associated with this entered line is:
" ls -l -a &\n"
the copy of it in the cmdline_copy buffer looks like:
cmdline_copy 0 | ' ' |
1 | ' ' |
2 | 'l' |
3 | 's' |
4 | ' ' |
5 | ' ' |
6 | '-' |
7 | 'l' |
8 | ' ' |
9 | ' ' |
10 | '-' |
11 | 'a' |
12 | ' ' |
13 | '&' |
14 | '\n'|
15 | '\0'|
Your function will TOKENIZE this string and set each argv array
bucket to point into
the start of its associated token string in the char buffer
(cmdline_copy array):
0 1 2 3
------------------------
argv | * | * | * | * |
---|-----|-----|-----|--
cmdline_copy 0 | ' ' | | | | |
1 | ' ' | | | | |
2 | 'l' |<---------- | | ----
3 | 's' | | | (NULL)
4 | '\0'| | |
5 | ' ' | | |
6 | '-' |<---------------- |
7 | 'l' | |
8 | '\0'| |
9 | ' ' | |
10 | '-' |<-----------------------
11 | 'a' |
12 | '\0'|
13 | '&' |
14 | '\n'|
15 | '\0'|
Note the changes to the cmdline_copy string contents and
the assignment of argv bucket values into different starting
points in the char buffer.
Printing out the argv strings in order will list the
ls
-l
-a
The function should return 1 if there is an ampersand in the command
line or 0 otherwise (so, 1 in the above example)
The parse_cmd_dynamic function
There are two main problems with the previous function:
- It assumes fixed-size max values for the command line string
and the argv list. If a user enters a longer command line string than
MAXLINE or with more than MAXARGS, bad memory access errors will ensue.
- It uses a single global character buffer into which the tokenized
version of the command line string is parsed. This means that the
caller has to use the argv return strings before another call to
the parse_cmd function is made (since it will overwrite the buffer
with the new command line string that it tokenizes). For use in
the shell program this version is okay (do you understand why?), but
it limits the "general purpose-ness" of this function.
The parse_cmd_dynamic function solves these two problems by dynamically
allocating and returning the argv array of strings, one for each
command line argument.
/*
* parse_cmd_dynamic - parse the passed command line into an argv array
*
* cmdline: the command line string entered at the shell prompt
* (const means that this function cannot modify cmdline)
* bg: sets the value pointed to by bg 1 if command line is run in
* background, 0 otherwise (a pass-by-reference parameter)
*
* returns: a dynamically allocated array of strings, each element
* stores a string corresponding to a command line argument
* (the caller is responsible for freeing the returned
* argv list).
*/
char **parse_cmd_dynamic(const char *cmdline, int *bg);
This function will find tokens much like the previous version. However,
it must also determine how many tokens are in the cmdline string,
malloc EXACTLY the right number of argv buckets for the particular
cmdline string (remember an extra bucket at the end for NULL), and
then fore each token it will malloc up exactly enough space for a
a char array to store the string corresponding to a command line argument
(remember an extra bucket for the terminating '\0' character).
For example, if the cmdline string is:
" ls -l -a \n"
This function will malloc up an argv array of
char * values,
and then malloc up three arrays of char values, one for each
command line string (each of exactly the right size to store the string)"
// local var to store dynamically allocated args array of strings
char **args;
args --------->[0]-----> "ls"
[1]-----> "-l"
[2]-----> "-a"
[3]-----| (NULL)
Your function cannot modify the cmdline string that is passed in to it
But, you may malloc up space for a local copy of the cmdline string
to tokenize if this helps. If you do this, however, your function
must free this copy before it returns; the returned args list should
not point into this copy like the parse_cmd function does, but each
command line argument should be malloced up separately as a distinct
string of exactly the correct size).
This function is more complicated to implement and will likely
require at least more than a single passes through the chars
of the command line string.
Requirements
- Your two functions should meet the specifications described
above.
- You may only have the single global variable already defined
for you in parsecmd.c. All other variables should be local, and
values should be passed to functions.
- You may not change any of the function prototypes in the
parsecmd library. Your library code must work with our test code that
makes calls to these functions as they are defined above.
You really should not need to make any changes to the .h file.
- You should use good modular code. The two library functions should
not be static, but you can add helper functions that are private to the
.c file, and thus should be declared static.
- All system and function calls that return values, should have
code that detects and handles error return values.
- Your functions should work for command lines entered with
any amount of whitespace between command line options (but there
should be at least one whitespace char between each). For example,
all these should yield identical argv lists returned by your functions:
cat foo.txt blah.txt &
cat foo.txt blah.txt&
cat foo.txt blah.txt &
TEST that your code works for command lines with any amount of
whitespace between command line arguments
- Your code should be well commented. See my C style guide for
examples of what this means.
- Your code should be free of valgrind errors.
You will need to add code to tester.c to free the space
allocated and returned by the dynamic version of the function.
Any other space you malloc internally in your library functions
(that it does not explicitly return to the caller), should be freed
by them.
Useful C functions and Hints
- Implement and test incrementally! Start with the parse_cmd function
first before trying parse_cmd_dynamic. And break its functionality into
parts that you implement and test incrementally.
Use valgrind as you go to catch
memory access errors as you make them.
- Review strings, char, and pointers in C.
Here are some
C programming references. See my "char in C", "strings in C", and "pointers in C"
in particular.
- Use string library and ctype functions (see my string and char
documentation for some examples, and look at their man pages for
how to call and use). Some that may be useful include:
strlen, strcpy, strchr, strstr, isspace
Here is an example of using strstr and modifying a string to create
a substring:
int i;
char *ptr, *str;
str = malloc(sizeof(char)*64);
if(!str) { exit(1); }
ptr = strcpy(str, "hello there, how are you?");
if(!ptr) { exit(1); }
ptr = strstr(str, "how");
if(ptr) {
printf("%s\n", ptr); // prints: how are you?
ptr[3] = '\0';
printf("%s\n", ptr); // prints: how
} else {
printf("no how in str\n");
}
strstr may or may not be useful in this assignment, but you will need
to create token strings in a way that has some similarities to this
example.
- Command lines with ampersands in the middle can be handled like bash
handles them (bash ignores everything after the &):
"hello there & how are you?"
gets parsed into an argv list as:
argv[0]---->"hello"
argv[1]---->"there"
argv[2]----| (NULL)
-
You do not need to implement a solution using pointer arithmetic, but
if you'd like to use it, look at the pointer arithmetic examples from
this week's lab for some examples.
- Use gdb (or ddd) and valgrind. Here are some
C debugging guide
- Writing this type of string processing code can be very tricky.
Use the debugger to help you see what your code is doing. Stepping
through individual C statement execution using next may
be helpful. If you do this and want to see the results of instructions
on program variables, you can use the display command to
get gdb to automatically print out values every time it gains control.
Here is an example of printing out three variables (ptr, i, buffer):
(gdb) display ptr
(gdb) display i
(gdb) display buffer
- Think very carefully about type. Draw some pictures to help
you figure out what you need to access, and what type it is.
Sample Output
Here is Sample Output from a run
of my solution. Notice how it handles whitespace chars and parsing commands
with & in them. Also note that each argv string is printed between
# characters so that you can see if you are incorrectly including
any whitespace characters in an argument string result.
Submit
Once you are satisfied with your solution, hand it in by typing
handin31 at the unix prompt.
Only one of you or your partner should run handin31 to submit your
joint solutions If you accidentally both run it, send me email right
away letting me know which of the two solutions I should keep and which
I should discard (you don't want the grader to just guess which joint
solution to grade).
You may run
handin31
as many times as you like, and only the
most recent submission will be recorded. This is useful if you realize,
after handing in some programs, that you'd like to make a few more
changes to them.