1. Due Date

  • Due 11:59 PM, Tuesday, Nov. 26

Your partner for this lab is: Lab 8 Partners

2. Lab Goals:

  • Learn how to implement a C library (.h and .c files), and then use your library in a program.

  • Learn more about C strings and char types, and use and C ctype and string library functions in programs.

  • Gain more expertise with dynamic memory allocation and pointers in C.

  • Further your mastery of using gdb and valgrind for debugging C programs.

3. Lab Overview

In this assignment you and your partner will implement a C library and write code to test the functions in your library. You will implement two functions from the parsecmd library. The first is the function you used in your shell program to construct the argv array from a command line string:

// Parses a commad line into an argv array
// The function initializes argv bucket values to individual
// tokens from the cmdline string.
// The argv array of char * is passed in by the caller.
// This version doesn't do dynamic memory allocation...no freeing after use.
int parse_cmd(const char *cmdline, char *argv[], int *bg);

The second is a function with similar functionality, but it dynamically allocates the argv array (and dynamically allocates each C string pointed to by argv[i] elements) and returns it to the caller.

// Parses a commad line into an argv array
// Returns argv array as a dynamically allocated array of char * (type char **).
// The function initializes the returned array's bucket values to individual
// tokens from the cmdline string (each token in a dynamically alloc'ed string)
// The caller is responsible for freeing memory space of the returned array.
char **parse_cmd_dynamic(const char *cmdline, int *bg);

You will use a test program that includes and links in your library to test your library’s functionality and correctness.

Your library, once implemented, could be linked into your shell program (modify the LIBDIR variable in its Makefile to do this) and your shell should work identically to when it linked in my implementation of the parscmd library! You are not required to do this step, but you could try it out.

4. Lab Starting Point Code

4.1. Getting Your Lab 8 Lab Repo

Both you and your partner should clone your Lab repo into your cs31/labs subdirectory:

  1. get your Lab ssh-URL from the CS31 git org. The repository to clone is named Lab08-user1-user2, where user1 and user2 are the user names of you and your Lab partner.

  2. cd into your cs31/labs subdirectory:

    $ cd ~/cs31/labs
    $ pwd
  3. clone your repo

    $ git clone [your Lab08-user1-user2 url]
    $ cd Lab08-user1-user2
    $ ls
    Makefile README.md cs31shell.c  parsecmd.h sleeper.c

4.2. Starting Point Files:

The files included in your repo are:

  • Makefile: builds the library .o file and a tester program executable file

  • README.md: some notes to you

  • parsecmd.h: the header file for the parsecmd library. It contains the interface to the library (the function prototypes and any other definitions the library exports, like #define constants). You do not need to modify this file.

  • parsecmd.c: the implementation part of the parsecmd library. Your implementations of the two library functions, parse_cmd and parse_cmd_dynamic, go in here.

  • tester.c: a test program or your parsecmd library. Add code here to test the functionality of your library. Make sure to test both library functions. Again, add helper functions to make this code manageable. There is likely not a lot of code that you need to add to this file. But, note the TODO comments in this file for some places where you will need to add code and uncomment some of the starting point code in this file to fully test your library.

  • design_worksheet: start your lab assignment by filling in this worksheet with the design of each parse_cmd library function before you implement it. You should bring this filled out worksheet with you to the ninja session, and to office hours and other times you are seeking help on your lab assignment. To print a copy of this worksheet to a CS lab printer, use the lpr command: lpr design_worksheet. See the "Getting Started" tips in Section 7 for information about how and when to use this worksheet.

5. Lab Details

You will implement the parsecmd library that contains two functions (parse_cmd and parse_cmd_dynamic) that parse a command line string into its individual command line arguments and construct an argv list of strings from the command line args.

Your library functions can be used by other programs that #include "parsecmd.h" and link in a binary version of your library’s implementation (parsecmd.o). For example:

gcc -g -o tester tester.c parsecmd.o

The Makefile included with the lab assignment builds parsecmd.o and links it into the tester program. You should use the tester program to test the correctness of your library functions. You will need to add more testing code to tester.c to fully test your library functions.

5.1. The interface to the parsecmd library (parsecmd.h)

The parsecmd.h file contains the interface to the library you will implement. Applications using the parsecmd library should #include it:

#include "parsecmd.h"

You do not need to modify this file but you should use some of the constants it defines in your library implementation code. Also, read the comments in parsecmd.h to ensure that you implement the two library functions correctly (i.e., your functions do the interface says that they do).

5.2. Implementing the parsecmd library (parsecmd.c)

Both functions in the parsecmd library take in a command line string (like in the shell lab), and parse it into a argv list (an array of strings, one per command line argument). They both test for an ampersand in the command line, which, when present, indicates a background command. For example, if the user enters the follow command line string:

cat foo.tex

These functions will be passed the string:

"cat foo.tex\n"

And will parse the command line string into the argv array:

argv [0] ---->"cat"
argv [1] ---->"foo.tex"
argv [2] ----|  (NULL)

The main difference between the two functions is that the first uses a single statically declared char array, while the second dynamically allocates space for both the argv array and the strings each of its buckets points to.

5.2.1. The parse_cmd function

/*
 * parse_cmd - Parse the command line and build the argv array.
 *
 *    cmdline: the command line string entered at the shell prompt
 *    argv:  an array of size MAXARGS of char *
 *           parse_cmd will initialize its contents from the passed
 *           cmdline string.
 *           The caller should pass in a variable declared as:
 *              char *argv[MAXARGS];
 *              (ex) int bg = parse_cmd(commandLine, argv);
 *
 *           argv will be filled with one string per command line
 *           argument.  The first argv[i] value following the last
 *           command line string will be NULL.  For example:
 *              ls -l     argv[0] will be "ls"  argv[1] will be "-l"
 *                        argv[2] will be NULL
 *          for an empty command line, argv[0] will be set to NULL
 *    bg:   pointer to an int that will be set to 1 if the last
 *          argument is '&', 0 otherwise. So bg will be set to 1
 *          if the command is meant to be run in the background.
 *
 *    returns: -2 if cmdline is NULL
 *             -1 if the command line is empty
 *              0 to indicate success
 */
int parse_cmd(const char *cmdline, char *argv[], int *bg);

This function will initialize the passed-in argv array to point to substrings that it creates in a global char array (initialized this global char array to a copy of the passed-in command line string). This global array is already declared in parsecmd.c:

static char cmdline_copy[MAXLINE];

The parse_cmd function will:

  1. make a copy of the cmdline string in its cmdline_copy array

  2. process its copy of the string to find tokens, modifying cmdline_copy to create substrings for each token. A token is a sequence of non-whitespace chars, each separated by at least one whitespace character (or by &). Tokens should not include &, which has special meaning in command lines.

  3. assign each argv[i] bucket to point to its corresponding substring token in cmdline_copy. Remember that a NULL value in an argv[i] bucket is used to signify the end of the list of argv strings.

For example, if the command line entered is the following (note the user entered a few extra spaces in this command line, and $ is the shell prompt):

$   ls  -1  -a &

The command line string associated with this entered line is (note the spaces in the string):

"  ls  -l  -a &\n"

And the copy of it in cmdline_copy looks like:

cmdline_copy 0 | ' ' |
             1 | ' ' |
             2 | 'l' |
             3 | 's' |
             4 | ' ' |
             5 | ' ' |
             6 | '-' |
             7 | 'l' |
             8 | ' ' |
             9 | ' ' |
            10 | '-' |
            11 | 'a' |
            12 | ' ' |
            13 | '&' |
            14 | '\n'|
            15 | '\0'|

The parse_cmd function tokenizes the cmdline_copy string and sets each argv array bucket to point to the start of its associated token string in cmdline_copy. In memory this would look something like:

                                0     1     2     3
                             ------------------------
                        argv |  *  |  *  |  *  |  * |
                             ---|-----|-----|-----|--
cmdline_copy 0 | ' ' |          |     |     |     |
             1 | ' ' |          |     |     |     |
             2 | 'l' |<----------     |     |    ----
             3 | 's' |                |     |    (NULL)
             4 | '\0'|                |     |
             5 | ' ' |                |     |
             6 | '-' |<----------------     |
             7 | 'l' |                      |
             8 | '\0'|                      |
             9 | ' ' |                      |
            10 | '-' |<-----------------------
            11 | 'a' |
            12 | '\0'|
            13 | '&' |
            14 | '\n'|
            15 | '\0'|

Note the changes to the cmdline_copy string contents and the assignment of argv bucket values into different starting points in the char array.

In parsing the cmdline_copy string, the function should also notice that & appears in the command line, and it should set what bg points to to 1 to indicate this, otherwise it will set it to 0.

5.2.2. The parse_cmd_dynamic function

There are two main problems with the previous function:

  1. The user is limited to command line strings that are at most MAXLINE characters and have at most MAXARGS arguments.

  2. The function uses a single global character array. This means that the caller has to use the argv return strings before another call to the parse_cmd function is made (since it will overwrite cmdline_copy with the new command line string that it tokenizes). For use in the shell program this version is okay (think about why this is the case), but it limits the "general purpose-ness" of this function.

The parse_cmd_dynamic function solves these two problems by dynamically allocating and returning the argv array of strings, one string for each command line argument.

/*
 * parse_cmd_dynamic - parse the passed command line into an argv array
 *
 *    cmdline: the command line string entered at the shell prompt
 *         bg: will set value pointed to 1 if command line is run in
 *             background, 0 otherwise (a pass-by-reference parameter)
 *
 *    returns: a dynamically allocated array of strings, exactly one
 *             bucket value per argument in the passed command line
 *             the last bucket value is set to NULL to signify the
 *             end of the list of argument values.
 *             or NULL on an error
 *
 *             The caller is responsible for freeing the returned argv array.
 */
char **parse_cmd_dynamic(const char *cmdline, int *bg);

This function will find tokens much like the previous version does. However, it must also determine how many tokens are in the cmdline string and malloc EXACTLY the right number of argv buckets for the particular cmdline string (remember an extra bucket at the end for NULL). For each token, it must malloc exactly enough space for a char array to store the token as a string (remember an extra bucket for the terminating '\0' character).

For example, if the cmdline string is:

"   ls   -l   -a   \n"

This function will malloc an argv array of 4 char * values, and then malloc three arrays of char values, one for each command line string, the base address of each stored in a bucket of the argv array. For the example command above, the returned array would look something like this:

// local var to store dynamically allocated args array of strings
char **argv;


argv --------->[0]-----> "ls"
               [1]-----> "-l"
               [2]-----> "-a"
               [3]-----|  (NULL)

Your function cannot modify the cmdline string that is passed in to it. But you may malloc space for a local copy of the cmdline string to tokenizes if this helps. This may allow you to reuse some of the code you wrote for the parse_cmd function. If you use this approach, your function must free this copy before it returns; the returned args list should not point into this copy like with parse_cmd. Instead each command line argument should be malloced separately as a distinct string of exactly the correct size.

parse_cmd_dynamic is more complicated to implement than parse_cmd and will likely require more than a single pass through the chars of the command line string. Start with the design of it using the design_worksheet (see Section 4.2).

5.3. Example Output

Here is output from a run of my program: sample output.

6. Lab Requirements

  1. Your two functions should meet the specifications described above.

  2. You may only use the single global variable that is already defined for you in parsecmd.c. All other variables should be local, and values should be passed to functions.

  3. You may not change any of the function prototypes in the parsecmd library (and don’t change the .h file). Your library code must work with our test code that makes calls to these functions as they are defined above.

  4. You should use good modular code. The two library functions should not be static, but you can add helper functions that are private to the .c file, and thus should be declared static (they are only in scope inside the parsecmd.c file…​they are private functions in parsecmd.c).

  5. All system and function calls that return values should have code that detects and handles error return values.

  6. Your functions should work for command lines entered with any amount of whitespace between command line options (but there should be at least one whitespace char between each). For example, all these should yield identical argv lists:

    cat foo.txt  blah.txt      &
    cat foo.txt  blah.txt&
                 cat          foo.txt           blah.txt          &

    TEST that your code works for command lines with any amount of whitespace between command line arguments

  7. Your code should be well commented. See my C style guide for examples of what this means.

  8. Your code should be free of valgrind errors. You will need to add code to tester.c to free the space allocated and returned by the dynamic version of the function. Any other space you malloc internally in your library functions (that it does not explicitly return to the caller) should be freed before the function returns. Still Reachable is okay.

  9. You may not use the string library strtok or strtok_r functions in your solution.

7. Tips

  • Getting started:

    1. Start by reviewing Section 2.9.6 of the textbook, and/or the "CREATING AND USING YOUR OWN LIBRARY CODE" section of the following: Building and Using libraries in C, and the weekly lab library example.

    2. Next, print out the design_worksheet from your repo and follow the directions for outlining the design of your solution of each libary function before you start implementing it. See Section 4.2 for more details and directions.

    3. After filling out the worksheet with your design for parse_cmd, implement and test it.

    4. Next, fill out the the design_worksheet to design the parse_cmd_dynamic function.

    5. Then impement and test it.

  • Implement and test incrementally! Break up the functionality of a function into parts that you implement and test incrementally. Use valgrind as you go to catch memory access errors as you make them.

  • Review strings, char, and pointers in C. Chapter 2 of the textbook contains sections on this material. And see other references listed in Section 9.

  • Remember if you dynamically allocate space for a string (using malloc), you need to allocate a space at the end for the terminating null character ('\0'), and that you need to explicitly free the space when you are done using it (call free).

  • Use string library and ctype functions. (For more info see the textbook, my string and char documentation off my help pages, and the man pages of individual functions.) Some that may be useful include:

    strlen, strcpy, strchr, strstr, isspace

    Here is an example of using strstr and modifying a string to create a substring:

      int i;
      char *ptr, *str;
      str = malloc(sizeof(char)*64);
      if(!str) { exit(1); }
      ptr = strcpy(str, "hello there, how are you?");
      if(!ptr) { exit(1); }
      ptr = strstr(str, "how");
      if(ptr) {
         printf("%s\n", ptr);  // prints: how are you?
         ptr[3] = '\0';
         printf("%s\n", ptr);  // prints: how
      } else {
        printf("no how in str\n");
      }

    strstr may or may not be useful in this assignment, but you will need to create token strings in a way that has some similarities to this example.

  • Remember, you cannot directly compare two strings using == or other similar operators. Instead, you need to use strcmp.

  • You can directly compare the value of two char values using == and other relational operators. For example:

    char *str;
    ...
    if(str[i] == 'a') { ... }
  • Command lines with ampersands in the middle can be handled like bash handles them (bash ignores everything after the &):

    "hello there &amp; how are you?"

    gets parsed into an argv list that looks like this:

    argv[0]---->"hello"
    argv[1]---->"there"
    argv[2]----|  (NULL)
  • Use gdb (or ddd) and valgrind as you incrementally implement and test.

  • Writing string processing code can often times be tricky. Use the debugger to help you see what your code is doing. It may be helpful to step through each C statement using next If you do this and want to see the results of instructions on program variables you can use the display command to get gdb to automatically print out values every time it gains control. Here is an example of displaying two variables (ptr and i):

    (gdb) display ptr
    (gdb) display i
  • Think very carefully about type. Draw pictures to help you figure out what values you need to access and what their types are.

8. Submitting

Here are the commands to submit your solution (from one of you or your partner’s cs31/labs/Lab08-user1-user2 directory:

$ git add *.c
$ git commit -m "Lab 8 completed"
$ git push

If you have added other test programs for testing your shell, please add those too (just their .c file not executables).

If you have difficulty pushing your changes, see the "Troubleshooting" section and "can’t push" sections at the end of the Using Git for CS31 Labs page. And for more information and help with using git, see the git help page.

9. Handy Resources