This is a two week lab, but you should strive to get it done early to give yourself more time to study for the final exam.
This lab will be done with the same partner with whom you worked for the Game of Life lab.
Here is the partner list for Lab Partners
Expectations for Working with Partners
cd ~/cs31/labs
git clone [your_Lab10_URL]Then cd into your Lab10-you-partner subdirectory.
Makefile QUESTIONNAIRE big.txt square100.txt testedges.txtIf this didn't work, or for more detailed instructions see the the Using Git page.
/home/your_user_name/cs31/labs/Lab10-you-partner $ cp ../Lab07-you-partner/gol.c . $ cp ../Lab07-you-partner/*.txt . $ ls Makefile QUESTIONNAIRE gol.c and some .txt filesWith the lab 10 starting point code, I've given you a Makefile and some .txt input files. You should create more .txt input files to test large and long-running GOL sequences, with the computation split over multiple threads. The smaller inputs and the printing command line options are useful when you are testing for correctness.
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <pthread.h>
You will implement a parallel version of Conway's Game of Life using your sequential solution as a starting point for this lab. See the lab 7 assignment to remind yourself about the sequential version of the program, and about the rules of the game.
Your lab 10 solution will use the same method as lab 7 for initializing a dynamically allocated board from an input file given as a command line argument, and will have the same print or not-print option after every iteration. It will distribute board cells among the parallel threads in either a row-wise or column-wise fashion.
You will evaluate the scalability of your implementation as you increase the problem size and the number of threads.
You should begin lab 10 by first fixing any errors in your lab 7
sequential solution that you copied over into your labs/10 subdirectory.
Don't worry about getting
the torus correct (the edge cells) if you did not get that correct
in lab 7. As long as you have a correct GOL implementation for the middle (non-edge) cells, that is enough to move on to starting to parallelize your GOL
implementation.
A run with no printing enabled should only print out the final timing result of the gettimeofday timer and should not include any calls to usleep in the run.
$ ./gol usage: ./gol infile.txt print[0:1] ntids partition[0:1] print_alloc[0:1]Here are some example command lines:
# run with config values read from file1.txt, do not print the board after each round # create 8 threads, use row-wise partitioning, print per-thread partitioning details: ./gol file1.txt 0 8 0 1 # run with config file file2.txt, print the board after each round, create # 15 threads, use column-wise partitioning, don't print per-thread partitioning: ./gol file2.txt 1 15 1 0
The print per-thread board partioning command line option will help you debug the grid cell partioning scheme to ensure that you are correctly allocating rows or columns across the worker threads.
Your program should handle badly formed command lines and check for bad values entered (like zero or a negative integer for the number of threads). It should print out an error message and exit for badly formed command lines and handle most bad input values similarly.
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3When the number of threads does not evenly divide the dimension by which you are partitioning, divide the rows (or columns) up so that there is at most a difference of 1 assigned row (or column) between any two threads. For example, for an 8x7 board and 3 threads, you would partition rows and columns like this among the 3 threads (with tids 0-2):
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2In this partitioning scheme, a single row (or column) is assigned to exactly one thread; a single row (or column) is never split between two or more threads. Thus, in the above example threads 0 and 1 have one more row each of work to do than thread 2 in the row-wise partitioning, and thread 0 has one more column of work to do than threads 1 and 2 in the column-wise partitioning.
printf("tid %d my values ...\n", mytid, ...); fflush(stdout); // force the printf output to be printed to the terminal
# 9 threads, column-wise partitioning % gol square100.txt 0 9 1 1 tid 0: rows: 0:99 (100) cols: 0:11 (12) tid 1: rows: 0:99 (100) cols: 12:22 (11) tid 2: rows: 0:99 (100) cols: 23:33 (11) tid 4: rows: 0:99 (100) cols: 45:55 (11) tid 3: rows: 0:99 (100) cols: 34:44 (11) tid 5: rows: 0:99 (100) cols: 56:66 (11) tid 6: rows: 0:99 (100) cols: 67:77 (11) tid 7: rows: 0:99 (100) cols: 78:88 (11) tid 8: rows: 0:99 (100) cols: 89:99 (11) # 6 threads, row-wise partitioning % gol square100.txt 0 6 0 1 tid 0: rows: 0:16 (17) cols: 0:99 (100) tid 1: rows: 17:33 (17) cols: 0:99 (100) tid 3: rows: 51:67 (17) cols: 0:99 (100) tid 2: rows: 34:50 (17) cols: 0:99 (100) tid 4: rows: 68:83 (16) cols: 0:99 (100) tid 5: rows: 84:99 (16) cols: 0:99 (100) # 17 threads, row-wise partitioning % gol square100.txt 0 17 0 1 tid 0: rows: 0: 5 ( 6) cols: 0:99 (100) tid 1: rows: 6:11 ( 6) cols: 0:99 (100) tid 3: rows: 18:23 ( 6) cols: 0:99 (100) tid 2: rows: 12:17 ( 6) cols: 0:99 (100) tid 4: rows: 24:29 ( 6) cols: 0:99 (100) tid 5: rows: 30:35 ( 6) cols: 0:99 (100) tid 6: rows: 36:41 ( 6) cols: 0:99 (100) tid 7: rows: 42:47 ( 6) cols: 0:99 (100) tid 10: rows: 60:65 ( 6) cols: 0:99 (100) tid 11: rows: 66:71 ( 6) cols: 0:99 (100) tid 12: rows: 72:77 ( 6) cols: 0:99 (100) tid 13: rows: 78:83 ( 6) cols: 0:99 (100) tid 8: rows: 48:53 ( 6) cols: 0:99 (100) tid 15: rows: 90:94 ( 5) cols: 0:99 (100) tid 14: rows: 84:89 ( 6) cols: 0:99 (100) tid 9: rows: 54:59 ( 6) cols: 0:99 (100) tid 16: rows: 95:99 ( 5) cols: 0:99 (100)Note that for the 17 thread run there are enough threads to see "out of order" execution due to the exact scheduling of threads on the CPUs.
$ ./gol test_oscillator.txt 1 4 0 0 start board: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @ @ @ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - number of live cells: 3 end board: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @ - - - - - - - - - - @ - - - - - - - - - - @ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - number of live cells: 3 total time for 21 iterations of 11x11 is 5.493663 secsIf you see odd output, either your parallelization of GOL is incorrect, or you are missing synchronization, or both. You can remove the call to system("clear") to see the history of each round to help you debug incorrect GOL computations.
In addition to the parallelization, partitioning, command line options, and correctness requirements for pthreads GOL described above, your solution must:
The one exception is that if you did not correctly get the torus part of gol to work (your corner or edge cells are incorrect), you can parallelize a gol solution with just all non-edge cells correct.
big_buff = malloc( ...); if(!big_buff) { perror("mallocing big buffer failed\n"); fflush(sterr); exit(1); }See the man page for perror for more information and for the include files you need to add to your program.
Think about what shared state is updated by more than one thread. Think about guarantees you need for correctness and how concurrent thread execution may or may not meet these guarantees.
Look at the lecture notes and the week12 Weekly lab synch.c example for examples of using some pthread synchronization primitives. Here are some code snippets for using the barrier synchronization primitive:
// declare a global pthread_barrier_t var static pthread_barrier_t barrier; // initialize it somewhere before using it: if(pthread_barrier_init(&barrier, NULL, num_threads)){ perror("Error: with pthread barrier init\n"); exit(1); } // threads can then call pthread_barrier_wait to synchronize: ret = pthread_barrier_wait(&barrier); if(ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD) { perror("Error: can't wait on pthread barrier\n"); exit(1); }
Any values that threads need should be passed to them via the (void *arg) parameter. Look at the parallel max code from class for an example of how to do this.
With every lab assignment is a file named QUESTIONNAIRE for you to fill out and submit with your lab solution. In this file you will answer some questions about the lab assignment. You should fill this out and submit it with your lab solution.
From one of your local repos (in your ~you/cs31/labs/Lab10-partner1-partner2 subdirectory):
git push
git add gol.c git add Makefile git add QUESTIONNAIRE git add *.txt # add any test files you created too git commit git pushAnother likely source of a failed push is that your partner pushed, and you have not pulled their changes. Do a git pull. Compile and test that your code still works. Then you can add, commit, and push.
If that doesn't work, take a look at the "Troubleshooting" section of the Using git page. You may need to pull and merge some changes from master into your local. If so, this indicates that your partner pushed changes that you have not yet merged into your local. Anytime you pull into your local, you need to check that the result is that your code still compiles and runs before submitting.