This lab should be done with your lab partner.
First, both you and your partner should run update31 to create the handin directory structure for lab 8, then you should copy your lab 5 solution into your lab8 subdirectory:
$ update31 $ cd cs31/labs/08 $ pwd /home/your_user_name/cs31/labs/08 $ cp ../05/gol.c . $ cp ../05/Makefile . $ cp ../05/*.txt . $ ls Makefile gol.c oscillator.txtYou will start with your lab 5 solution to the Game of Life program, modifying it to implement a parallel version of GOL using pthreads. To link in the pthread library, to your Makefile, add the following to the end of the command for building the gol executable:
-pthread # for example, you may have something that looks like this: gol: gol.c gcc -g -Wall -o gol gol.c -pthread
You will implement a parallel version of Conway's Game of Life using your sequential solution as a starting point for this lab. See the lab 5 assignment to remind yourself about the sequential version of the program, and about the rules of the game. You will evaluate the scalability of your implementation as you increase the problem size and the number of threads.
Your lab 8 solution will use the same method as lab 5 for initializing a dynamically allocated board from an input file given as a command line argument, and will have the same print or not-print option after every iteration.
You should begin lab 8 by first fixing any errors in your lab 5 sequential solution that you copied over into your labs/08 subdirectory.
A run with no printing enabled should only printout the final timing result of the gettimeofday timer and should not include any calls to usleep in the run.
$ ./gol usage: ./gol infile.txt print[0:1] ntids partition[0:1] print_alloc[0:1]Here are some example command lines:
# run with config values read from file1.txt, do not print the board after each round # create 8 threads, use row-wise partitioning, print per-thread partitioning details: ./gol file1.txt 0 8 0 1 # run with config file file2.txt, print the board after each round, create # 15 threads, use column-wise partitioning, don't print per-thread partitioning: ./gol file2.txt 1 15 1 0
The print per-thread board allocation command line option will help you debug the grid cell allocation scheme to ensure that you are correctly allocating rows or columns across the worker threads.
Your program should handle badly formed command lines and check for bad values entered (like a negative number of threads). It should print out an error message and exit for badly formed command lines and handle most bad input values similarly.
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3When the number of threads does not evenly divide the dimension by which you are partitioning, divide the rows (or columns) up so that there is at most a difference of 1 assigned row (or column) between any two threads. For example, for an 8x7 board and 3 threads, you would partition rows and columns like this among the 3 threads (with tids 0-2):
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2In this partitioning scheme, a single row (or column) is assigned to exactly one thread; a single row (or column) is never split between two or more threads. Thus, in the above example threads 0 and 1 have one more row each of work to do than thread 2 in the row-wise partitioning, and thread 0 has one more column of work to do than threads 1 and 2 in the column-wise partitioning.
printf("tid %d my values ...\n", mytid, ...); fflush(stdout); // force the printf output to be printed to the terminalHere are some example runs of my program with different numbers of threads partitioning a 100x100 grid. The total number of rows and columns per thread are shown in parentheses (your output does not need to be identical to mine but must include all required parts):
# 9 threads, column-wise partitioning % gol big 0 9 1 1 tid 0: rows: 0:99 (100) cols: 0:11 (12) tid 1: rows: 0:99 (100) cols: 12:22 (11) tid 2: rows: 0:99 (100) cols: 23:33 (11) tid 4: rows: 0:99 (100) cols: 45:55 (11) tid 3: rows: 0:99 (100) cols: 34:44 (11) tid 5: rows: 0:99 (100) cols: 56:66 (11) tid 6: rows: 0:99 (100) cols: 67:77 (11) tid 7: rows: 0:99 (100) cols: 78:88 (11) tid 8: rows: 0:99 (100) cols: 89:99 (11) # 6 threads, row-wise partitioning % gol big 0 6 0 1 tid 0: rows: 0:16 (17) cols: 0:99 (100) tid 1: rows: 17:33 (17) cols: 0:99 (100) tid 3: rows: 51:67 (17) cols: 0:99 (100) tid 2: rows: 34:50 (17) cols: 0:99 (100) tid 4: rows: 68:83 (16) cols: 0:99 (100) tid 5: rows: 84:99 (16) cols: 0:99 (100) # 17 threads, row-wise partitioning % gol big 0 17 0 1 tid 0: rows: 0: 5 ( 6) cols: 0:99 (100) tid 1: rows: 6:11 ( 6) cols: 0:99 (100) tid 3: rows: 18:23 ( 6) cols: 0:99 (100) tid 2: rows: 12:17 ( 6) cols: 0:99 (100) tid 4: rows: 24:29 ( 6) cols: 0:99 (100) tid 5: rows: 30:35 ( 6) cols: 0:99 (100) tid 6: rows: 36:41 ( 6) cols: 0:99 (100) tid 7: rows: 42:47 ( 6) cols: 0:99 (100) tid 10: rows: 60:65 ( 6) cols: 0:99 (100) tid 11: rows: 66:71 ( 6) cols: 0:99 (100) tid 12: rows: 72:77 ( 6) cols: 0:99 (100) tid 13: rows: 78:83 ( 6) cols: 0:99 (100) tid 8: rows: 48:53 ( 6) cols: 0:99 (100) tid 15: rows: 90:94 ( 5) cols: 0:99 (100) tid 14: rows: 84:89 ( 6) cols: 0:99 (100) tid 9: rows: 54:59 ( 6) cols: 0:99 (100) tid 16: rows: 95:99 ( 5) cols: 0:99 (100)Note that for the 17 thread run there are enough threads to see "out of order" execution due to the exact scheduling of threads on the cpus.
$ ./gol oscillator.txt 1 4 0 0 start board: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @ @ @ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - end board: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @ - - - - - - - - - - @ - - - - - - - - - - @ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - total time for 21 iterations of 11x11 is 5.493663 secsIf you see odd output, either your parallelization of GOL is incorrect, or you are missing synchronization, or both. You can remove the call to system("clear") to see the history of each round to help you debug incorrect GOL computation (of course run for a small number of iterations).
In addition to the parallelization and correctness requirements described above, you solution must:
You can write up your report as an ascii file using vim (don't wrap lines), or you can try using latex (details below). Your report should be no more than 3 pages long, so think about quality over quantity in presenting results, and be concise yet complete in your explanations. One page in an ascii file is 59 lines (CNTRL-L is a page break if you want to explicitly add one in). If you use ascii, then present your results in tables something similar to this:
Total Execution Time for different number of threads and board sizes num tids | 500x500 | 1000X1000 | 2000x2000 | ... ----------------------------------------------------------- 1 | x secs | x secs | x secs | 2 | x secs | x secs | x secs | 4 | x secs | x secs | x secs | ...
As you design experiments, you will want to vary the grid size and the number of threads to obtain different data points:
As you run experiments, make sure you are doing so in a way that doesn't interfere with others (see the Useful Unix Utilities section below for some tips about how to ensure this). Also, remember to run without any of the printing options enabled (and make sure you are not including usleep calls in timed runs (usleep should only be called from within printing enabled path in your code).
You should run multiple runs of each experiment (i.e. don't do only a single run of 16 threads for 512x512 and 10000 iterations). The purpose of multiple runs is to determine how consistent your runs are (look at the standard deviation across runs), and to also see if you have some odd outliers (which may indicate that something else running on the computer interfered with this run, and that you should discard this result).
You should be careful to control the experimental runs as much as possible. If other people using the machine you are running experiments on, their programs can interfere with your results. Also, run all experiments on the same machine. You can see what is running on a machine:
The following machines are not in the main or overflow labs, and thus are more likely to be available for extended periods of time, so try one of these first:
mushroom avocado carrot savory parsleyAnd, avoid using main and overflow lab machines during classes.
As much as possible, it is good to run all your experiments on the same machine, or at least on identical machines.
Finally, be aware of other groups wanting to run experiments on the 8 and 16 node machines. So, please don't run experiments on a machine for hours and hours or days and days. And, logout when you are not using a machine to run experiments. If you run who and someone is logged in, run top to see if they are actually running on the machine before deciding it is okay for you to use.
In the Hints section below there is information about utilities for running
experiments.
big_buff = malloc( ...); if(!big_buff) { perror("mallocing big buffer failed\n"); }See the man page for perror for more information and for the include files you need to add to your program.
// declare a global pthread_barrier_t var static pthread_barrier_t barrier; // initialize it somewhere before using it: if(pthread_barrier_init(&barrier, NULL, num_threads)){ perror("Error: with pthread barrier init\n"); exit(1); } // threads than can call pthread_barrier_wait to synchronize: ret = pthread_barrier_wait(&barrier); if(ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD) { perror("Error: can't wait on pthread barrier\n"); exit(1); }
$ cat /proc/cpuinfo $ cat /proc/meminfo
#!/bin/bash for f in infile_500_500.txt infile_1000_500.txt infile_2000_500.txt do for((p=0; p <= 1; p++)) do for ((t=1; t <= 64; t*=2)) do echo "" echo "$f $t $p 0" for((i=0; i < 5; i+=1)) do time ./gol $f 0 $t $p 0 done done done doneSee bash shell programing links for more information.
I'd run this bash script inside a script session to capture all its output to a file, and if it is long running, I may also run within a screen session (then I can start it late at night, wake up the next morning and see what happened).
$ script results Script started, file is results % ./run ... % exit Script done, file is results $ dos2unix -f results $ dos2unix -f results
To print a ascii file on our system:
lpr REPORTTo print a pdf file, do so from within xpdf (lpr will print out pages and pages of raw pdf encoding):
xpdf report.pdf # then choose the print option under the File menuIf you accidentally print a .pdf usin lpr, you can kill the print job:
lprmMake sure both you and your partner's names are on the report
Submit all your source code and Makefile and your report in your labs/08/ subdirectory. If your report is complete at this time, you can include it in this directory (you still need to submit to me in person a printout of your report). If you use vim for your report, put it in a file named REPORT. If you use latex, include both the REPORT.tex file and any image files, and the REPORT.pdf file.
Once you are satisfied with your solution, hand it in by typing handin31 at the unix prompt.
Only one of you or your partner should run handin31 to submit your joint solutions If you accidentally both run it, send me email right away letting me know which of the two solutions I should keep and which I should discard (you don't want the grader to just guess which joint solution to grade).
You may run handin31 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize, after handing in some programs, that you'd like to make a few more changes to them.