1. Results/Measures
You should run timed runs of different experiments, and multiple instances of each experiment. Present results as the average over runs, and note standard deviations. See the Lab 1 assigment for more details.
Here are Here are some measures that you may want to use to present results:
-
Average Total Runtime.
-
Speed-up:
Speed up = (Sequential Time) / (Parallel Time)
-
Efficiency:
Efficiency = Speed-Up/(P) P is the number of cores or threads
2. Examples of useful tools
Here is some example code you can copy over and try out with these tools (either cd into this directory to try out or you could make a copy in your cs87 subdir):
cd cs87
cp -r ~newhall/public/cs87/experiment_tools .
You copied over some example scripts that may be useful in helping
you to write similar scripts for running experiments:
<ul>
<li>
* run.sh
: example bash script for running a bunch of experiments
with different input parameters. To run:
+
./run.sh
-
run_outfile.sh
: example bash script for running a bunch of experiments with different parameters and capturing all output to a file. Note the bash command syntax for runningtime matrixmult
and redirecting its output to a file (&>>
: appends stdout and stderr to the specified outpufile). To run:./run.sh # output will go to file named "myoutputfile" in your home directory ./run_outfile.sh ~/myoutputfile # or to default file name "output" in the current directory # you can't write into my directory so it will only work if you copied # this script over into your subdirectory ./run_outfile.sh
-
killmytests.sh
: an example script that to kill all your experiments. Always:-
first pkill -9 the run script (run_outfile.sh in this example)
-
then pkill -9 your program executable (matrixmult in this example)
This script is very useful if you want to stop all your experiments from running, particularly if you want to do so in the middle of the night, you can schedule a cron job to run this script.
-
3. Tools/Utilities
3.1. how to find an "idle" machine to use
You can run who
see who is logged in, and top -H
or htop
to see what is running on a machine, and who is logged in. In
htop
if you hit the F6
key you can sort results by different
columns. My help pages describes some examples of how to configure
what data top shows (htop is similarliy configurable). See
the man pages for both.
You want to run your experiments on a machine that is idle so that other processes running on the machine do not interfere with your results. If no one is logged in, the machine may be idle, but someone could be running jobs in a screen session. Also, there may be people logged in, but they are not actively running anything on the system (the forgot to logout when they left the lab, for example).
Let top
run for a minute or so to be certain a machine is
idle before firing off a lot of experiements. Also, if you
use machines reserved for our class, then you should be able
to find an idle one (please share these machines).
See the Lab 1 page for more information about this.
3.2. screen
screen
is useful for logining in, starting something running,
and then and then loging out while it runs: by running in
a screen session and what you run in the screen session stays
running when you log out.
Here are the steps to using screen
:
-
login and run screen to start a screen session:
screen
-
start the script you plan to run in this session. I suggest running a bash script of experiements inside a script session (details below), or run a bash script that redirects output of each run to a file.
-
detach from the screen session by typing:
Cntl-A d
-
then logout of the computer if you’d like
To reattach to a screen session, login to the computer
on which you ran screen
and started some experiments
running and detached, and then run:
screen -r
And you can attach and detach as many times as you’d like from the same screen sesson.
3.3. script and dos2unix
script captures a terminal session to a file. dos2unix cleans up the resulting file after quit script. See more details here:
Python is a nice language to use to process the resulting typescript file to pull out timing results for related runs, compute average, std dev, spit out results in a nice form.
3.4. bash scripts
Write a bash script to fire off a bunch of experiments. Then just run the bash script and come back later when done. Its good to have some <tt>echo</tt> commands in your bash script to print out some information about particular runs: this will help with your post-processing scripts to find timing results and compute averages and std dev. With the lab01 starting point code was one example bash script, try that out to see what it does. I also have links to bash programming off my help pages:
When you create a bash script, make sure the file is executable to run it:
vim runexper.sh # or emacs
chmod 777 runexper.sh # set to executable
ls -l
./runexper.sh
Also, try running your bash script a few times before starting it up in screen and coming back later: make sure it is doing what you think it is. You can always comment out the call to gol program in the script to see if it is doing what you want (# is the bash single line comment):
#!/usr/bash
for((n=256; n <= 2049; n=n*2))
do
for ((t=1; t <= 32; t=t*2))
do
echo ""
echo "gol -t $t -n $n -m $n -k 1000"
# time ./gol -t $t -n $n -m $n -k 1000 -x
done
done
If I run the above bash script I’ll see all the calls to echo print out parameter configs and see if they are what I expect. Then uncomment and run.
In your bash script make sure you run time ./gol …
to collect runtimes.
3.5. cron
You can add a cron job to run your script at a particular date and time by editing the crontab file on the machine you are running your experiments (ex. on chervil or some of the other CS87-only machines):
$ ssh chervil
$ crontab -e
Then add a line like this to run the killmytests.sh script at a specific time and date (at 8pm (20:00), on January (1) 31 :
0 20 31 1 * /home/newhall/public/cs87/experiment_tools/killmytests.sh
Similarly you can add a cron job to run your experiements at a specific time (here I’m starting them at 4:05 am on February 3):
5 4 3 2 * /home/newhall/public/cs87/experiment_tools/run_outfile.sh ./mytests
NOTE: please after your cron jobs run, make sure to run crontab -e
again
to remove them from the crontab file (so that cron doesn’t run them every year
on this date at this time until we remove your account).
4. Let’s Try some stuff out
Let’s try some of these steps together in the example you copied over.
First lets try out screen and script: * ssh into a machine, see if idle * start screen * cd to directory containing gol and bash script * start script * start bash script to run experiments * hit return and type <tt>exit</tt> (to terminate script…good practice) * detach from script * run top -H just to see if program is running * log out of machine
Then later, ssh back in the machine and re-attach to screen session.
On a different machine, create a cron job to run a test script and another
to kill my test script and running test programs.
* run date
to get the current time
* run crontab -e
and let’s start the run_outfile.sh in
2 mins and kill one minute later. In this example, let’s say it is
Feb. 1st at 1:30pm right now:
$ crontab -e
# start run_outfile.sh (with output file mytests in your home directory)
# at 1:32pm on Feb. 1 (minute:32, hour:13, day:1, month:2)
32 13 1 2 * /home/newhall/public/experiment_tools/run_outfile.sh ~/mytests
# run killmytests.sh at 1:33pm on Feb. 1
32 13 1 2 * /home/newhall/cs87/experiment_tools/killmytests.sh
Now, let’s run top -H
or htop
and see what happens.
5. Handy Resources
-
Class piazza page for questions and answers about assignment
-
tools for running experiments off my Help Pages, has documentation about useful tools and utilities for running experiements.
-
CS Machine Specs page from the "cs lap help" link off the main cs page, list specs for all the CS mahines. Sort the Machines Table by #ofCores to find machines with more cores.
-
bash shell programming links