I’m going to show you some tools that may be useful for running Lab 1 experiments for your scalability study.

See Section 5 for more information and links to useful tools and resources for running experiments.

1. Results/Measures

You should run timed runs of different experiments, and multiple instances of each experiment. Present results as the average over runs, and note standard deviations. See the Lab 1 assigment for more details.

Here are Here are some measures that you may want to use to present results:

  • Average Total Runtime.

  • Speed-up: Speed up = (Sequential Time) / (Parallel Time)

  • Efficiency: Efficiency = Speed-Up/(P) P is the number of cores or threads

2. Examples of useful tools

Here is some example code you can copy over and try out with these tools (either cd into this directory to try out or you could make a copy in your cs87 subdir):

cd cs87
cp -r ~newhall/public/cs87/experiment_tools .

You copied over some example scripts that may be useful in helping you to write similar scripts for running experiments: <ul> <li> * run.sh: example bash script for running a bunch of experiments with different input parameters. To run:

+

  ./run.sh
  • run_outfile.sh: example bash script for running a bunch of experiments with different parameters and capturing all output to a file. Note the bash command syntax for running time matrixmult and redirecting its output to a file (&>>: appends stdout and stderr to the specified outpufile). To run:

      ./run.sh
      # output will go to file named "myoutputfile" in your home directory
      ./run_outfile.sh  ~/myoutputfile
    
      # or to default file name "output" in the current directory
      # you can't write into my directory so it will only work if you copied
      # this script over into your subdirectory
      ./run_outfile.sh
  • killmytests.sh: an example script that to kill all your experiments. Always:

    • first pkill -9 the run script (run_outfile.sh in this example)

    • then pkill -9 your program executable (matrixmult in this example)

      This script is very useful if you want to stop all your experiments from running, particularly if you want to do so in the middle of the night, you can schedule a cron job to run this script.

3. Tools/Utilities

3.1. how to find an "idle" machine to use

You can run who see who is logged in, and top -H or htop to see what is running on a machine, and who is logged in. In htop if you hit the F6 key you can sort results by different columns. My help pages describes some examples of how to configure what data top shows (htop is similarliy configurable). See the man pages for both.

You want to run your experiments on a machine that is idle so that other processes running on the machine do not interfere with your results. If no one is logged in, the machine may be idle, but someone could be running jobs in a screen session. Also, there may be people logged in, but they are not actively running anything on the system (the forgot to logout when they left the lab, for example).

Let top run for a minute or so to be certain a machine is idle before firing off a lot of experiements. Also, if you use machines reserved for our class, then you should be able to find an idle one (please share these machines).

See the Lab 1 page for more information about this.

3.2. screen

screen is useful for logining in, starting something running, and then and then loging out while it runs: by running in a screen session and what you run in the screen session stays running when you log out.

Here are the steps to using screen:

  • login and run screen to start a screen session:

    screen
  • start the script you plan to run in this session. I suggest running a bash script of experiements inside a script session (details below), or run a bash script that redirects output of each run to a file.

  • detach from the screen session by typing: Cntl-A d

  • then logout of the computer if you’d like

To reattach to a screen session, login to the computer on which you ran screen and started some experiments running and detached, and then run:

screen -r

And you can attach and detach as many times as you’d like from the same screen sesson.

3.3. script and dos2unix

script captures a terminal session to a file. dos2unix cleans up the resulting file after quit script. See more details here:

Python is a nice language to use to process the resulting typescript file to pull out timing results for related runs, compute average, std dev, spit out results in a nice form.

3.4. bash scripts

Write a bash script to fire off a bunch of experiments. Then just run the bash script and come back later when done. Its good to have some <tt>echo</tt> commands in your bash script to print out some information about particular runs: this will help with your post-processing scripts to find timing results and compute averages and std dev. With the lab01 starting point code was one example bash script, try that out to see what it does. I also have links to bash programming off my help pages:

When you create a bash script, make sure the file is executable to run it:

vim runexper.sh  # or emacs
chmod 777 runexper.sh  # set to executable
ls -l
./runexper.sh

Also, try running your bash script a few times before starting it up in screen and coming back later: make sure it is doing what you think it is. You can always comment out the call to gol program in the script to see if it is doing what you want (# is the bash single line comment):

#!/usr/bash

for((n=256; n <= 2049; n=n*2))
do
for ((t=1; t &lt;= 32; t=t*2))
  do
     echo ""
     echo "gol -t $t -n $n -m $n  -k 1000"
#    time ./gol -t $t -n $n -m $n  -k 1000 -x
  done
done

If I run the above bash script I’ll see all the calls to echo print out parameter configs and see if they are what I expect. Then uncomment and run.

In your bash script make sure you run time ./gol …​ to collect runtimes.

3.5. cron

You can add a cron job to run your script at a particular date and time by editing the crontab file on the machine you are running your experiments (ex. on chervil or some of the other CS87-only machines):

  $ ssh chervil
  $ crontab -e

Then add a line like this to run the killmytests.sh script at a specific time and date (at 8pm (20:00), on January (1) 31 :

  0 20 31 1 * /home/newhall/public/cs87/experiment_tools/killmytests.sh

Similarly you can add a cron job to run your experiements at a specific time (here I’m starting them at 4:05 am on February 3):

  5 4 3 2 * /home/newhall/public/cs87/experiment_tools/run_outfile.sh ./mytests

NOTE: please after your cron jobs run, make sure to run crontab -e again to remove them from the crontab file (so that cron doesn’t run them every year on this date at this time until we remove your account).

4. Let’s Try some stuff out

Let’s try some of these steps together in the example you copied over.

First lets try out screen and script: * ssh into a machine, see if idle * start screen * cd to directory containing gol and bash script * start script * start bash script to run experiments * hit return and type <tt>exit</tt> (to terminate script…​good practice) * detach from script * run top -H just to see if program is running * log out of machine

Then later, ssh back in the machine and re-attach to screen session.

On a different machine, create a cron job to run a test script and another to kill my test script and running test programs. * run date to get the current time * run crontab -e and let’s start the run_outfile.sh in 2 mins and kill one minute later. In this example, let’s say it is Feb. 1st at 1:30pm right now:

$ crontab -e
# start run_outfile.sh (with output file mytests in your home directory)
# at 1:32pm on Feb. 1  (minute:32, hour:13, day:1, month:2)
32 13 1 2 * /home/newhall/public/experiment_tools/run_outfile.sh ~/mytests
# run killmytests.sh at 1:33pm on Feb. 1
32 13 1 2 * /home/newhall/cs87/experiment_tools/killmytests.sh

Now, let’s run top -H or htop and see what happens.

5. Handy Resources