This lab is designed to give you some practice writing and debugging multi-threaded pthreads programs, using synchronization primitives, and experience designing and running scalability experiments.
You'll implement a parallel version of Conway's Game of Life using your sequential solution as a starting point for this lab. See the lab 6 assignment to remind yourself about the sequential version of the program and the rules of the game. You will evaluate the scalability of your implementation as you increase the problem size and the number of threads.
Your lab 9 solution will use the same method as lab 6 for initializing a dynamically allocated board from an input file given as a command line argument, and will have the same output mode options. You should probably begin lab 9 by first fixing any errors in your lab 6 sequential solution.
Your parallel version of the game of life should be structured as follows:
You will need to think about where and how to synchronize thread actions to remove race-conditions. For example, you need to ensure that no thread starts the next round of play until all threads are done with the current round, otherwise threads will not read consistent values for the given round. If printing is enabled, you need to ensure that the printing thread prints a consistent version of the game board. You will also need to ensure that threads do not race to update the global live cell counter.
Since synchronization variables are intended to be shared among threads, you may declare them as globals if you find that to be helpful.
Your program will take the same two command line arguments from lab 6, plus three new command line arguments:
$ ./gol usage: ./gol infile.txt output_mode[0,1,2] num_threads partition[0,1] print_partition[0,1]
Here are some example command lines:
# run with config values read from file1.txt, do not print the board after each round # create 8 threads, use row-wise partitioning, print per-thread partitioning details: ./gol file1.txt 0 8 0 1 # run with config file file2.txt, print the board after each round, create # 15 threads, use column-wise partitioning, don't print per-thread partitioning: ./gol file2.txt 1 15 1 0
The print per-thread board allocation command line option will help you debug the grid cell allocation scheme to ensure that you are correctly allocating rows or columns across the worker threads.
Your program should handle badly formed command lines and check for bad values entered (e.g., a negative number of threads). It should print out an error message and exit for badly formed command lines and handle most bad input values similarly.
You will implement two different ways of partitioning the computation: one partitions the board's rows across threads, the other partitions the board's columns across threads. As an example, suppose that the board is 8x8, and that there are 4 threads. Here is how the threads (with logical tids 0-3) would be assigned to the grid using row and column partitioning, respectively:
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3
When the number of threads does not evenly divide the dimension by which you are partitioning, divide the rows (or columns) up so that there is at most a difference of 1 assigned row (or column) between any two threads. For example, for an 8x7 board and 3 threads, you would partition rows and columns like this among the 3 threads (with tids 0-2):
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2
In this partitioning scheme, a single row (or column) is assigned to exactly one thread; a single row (or column) is never split between two or more threads. Thus, in the above example threads 0 and 1 have one more row each of work to do than thread 2 in the row-wise partitioning, and thread 0 has one more column of work to do than threads 1 and 2 in the column-wise partitioning.
When the 5th command line option is non-zero, your program should print thread partitioning information. Each thread should print:
The threads will run in different orders, so you may see each thread's output in any order, which is fine. If you'd like, you can try to avoid some interleaving of individual threads output by calling fflush after printf:
printf("tid %d my values ...\n", mytid, ...); fflush(stdout); // force the printf output to be printed to the terminal
Here are some example runs with different numbers of threads partitioning a 100x100 grid. The total number of rows and columns per thread are shown in parentheses (your output does not need to be identical to this, but it must include all the required parts):
# 9 threads, column-wise partitioning ./gol big 0 9 1 1 tid 0: rows: 0:99 (100) cols: 0:11 (12) tid 1: rows: 0:99 (100) cols: 12:22 (11) tid 2: rows: 0:99 (100) cols: 23:33 (11) tid 4: rows: 0:99 (100) cols: 45:55 (11) tid 3: rows: 0:99 (100) cols: 34:44 (11) tid 5: rows: 0:99 (100) cols: 56:66 (11) tid 6: rows: 0:99 (100) cols: 67:77 (11) tid 7: rows: 0:99 (100) cols: 78:88 (11) tid 8: rows: 0:99 (100) cols: 89:99 (11) # 6 threads, row-wise partitioning ./gol big 0 6 0 1 tid 0: rows: 0:16 (17) cols: 0:99 (100) tid 1: rows: 17:33 (17) cols: 0:99 (100) tid 3: rows: 51:67 (17) cols: 0:99 (100) tid 2: rows: 34:50 (17) cols: 0:99 (100) tid 4: rows: 68:83 (16) cols: 0:99 (100) tid 5: rows: 84:99 (16) cols: 0:99 (100) # 17 threads, row-wise partitioning ./gol big 0 17 0 1 tid 0: rows: 0: 5 (6) cols: 0:99 (100) tid 1: rows: 6:11 (6) cols: 0:99 (100) tid 3: rows: 18:23 (6) cols: 0:99 (100) tid 2: rows: 12:17 (6) cols: 0:99 (100) tid 4: rows: 24:29 (6) cols: 0:99 (100) tid 5: rows: 30:35 (6) cols: 0:99 (100) tid 6: rows: 36:41 (6) cols: 0:99 (100) tid 7: rows: 42:47 (6) cols: 0:99 (100) tid 10: rows: 60:65 (6) cols: 0:99 (100) tid 11: rows: 66:71 (6) cols: 0:99 (100) tid 12: rows: 72:77 (6) cols: 0:99 (100) tid 13: rows: 78:83 (6) cols: 0:99 (100) tid 8: rows: 48:53 (6) cols: 0:99 (100) tid 15: rows: 90:94 (5) cols: 0:99 (100) tid 14: rows: 84:89 (6) cols: 0:99 (100) tid 9: rows: 54:59 (6) cols: 0:99 (100) tid 16: rows: 95:99 (5) cols: 0:99 (100)
Note that for the run with 17 threads, there are enough threads to see "out of order" execution due to the exact scheduling of threads on the CPUs. That's fine, we're at the mercy of the OS scheduler!
In addition to printing out per-thread board partitioning information to verify correct allocation, you should also ensure that your parallel version simulates GOL correctly. Run your parallel version on some of the same input files as your sequential one with printing enabled. The results should be identical.
If you see odd output, either your parallelization of GOL is incorrect, or you are missing synchronization, or both. You can remove the call to system("clear") to see the history of each round to help you debug incorrect GOL computation (of course run for a small number of iterations).
Like lab 6, we'll be using a visualization library to display the board. This time, the visualization should also be helpful for determining which thread is responsible for each region of the board by coloring them differently. I tried to indicate exactly what you need to do to make the graphics library work in the starter code's TODO comments. Please let me know if I can clarify what needs to happen to support graphics. This component of the lab is new!
In a nutshell, when the output mode is OUTPUT_VISI, the graphics library needs:
if (buff != NULL) { int buff_index = (data->rows - (i + 1)) * data->cols + j; if (next_board[i * data->cols + j]) { /* Live cells get the color using this thread's ID as the index */ buff[buff_index] = colors[data->id]; } else { /* Dead cells are blank. */ buff[buff_index] = c3_black; } }To get the pretty rainbow background, you can reverse the assignments above.
big_buff = malloc(...); if (big_buff == NULL) { perror("mallocing big buffer failed"); }See the man page for perror for more information and for the include files you need to add to your program to use it.
// declare a global pthread_barrier_t static pthread_barrier_t barrier; // initialize it in a function (e.g., main) before using it: if(pthread_barrier_init(&barrier, NULL, num_threads)){ perror("pthread_barrier_init"); exit(1); } // threads than can call pthread_barrier_wait to synchronize: ret = pthread_barrier_wait(&barrier); if(ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD) { perror("pthread_barrier_wait"); exit(1); }
Please remove any debugging output prior to submitting.
To submit your code, simply commit your changes locally using git add and git commit. Then run git push while in your lab directory. Only one partner needs to run the final push, but make sure both partners have pulled and merged each others changes. See the section on Using a shared repo on the git help page.