Today we are going to try out some tools that may be useful for running Lab 1 experiments for your scalability study. See Section 5 for links to more information about these and other tools and resources for running experiments.

1. Results/Measures

You should run timed runs of different experiments, and multiple instances of each experiment. Present results as the average over runs, and note standard deviations. See the Lab 1 Experiments section for more details.

Here are Here are some measures that you may want to use to present results:

  • Average Total Runtime (and include Standard Deviation too).

  • Speed-up: Speed up = (Sequential Time) / (Parallel Time)

  • Efficiency: Efficiency = Speed-Up/(P) P is the number of cores or threads

2. Examples to try out

Copy over some example scripts that we will use to try out some of these tools, and that may be useful examples for writing and using your own experiment scripts:

cd cs87
cp -r ~newhall/public/cs87/experiment_tools .
cd exeriment_tools

ls -l
chmod 700 *.sh     # if need be, set executable permission so you can run them
  • run.sh: example bash script for running a bunch of experiments with different input parameters. To run:

      ./run.sh
  • run_outfile.sh: example bash script for running a bunch of experiments with different parameters and capturing all output to a file. Note the bash command syntax for running time matrixmult and redirecting its output to a file (&>>: appends stdout and stderr to the specified outpufile). To run:

      ./run.sh
      # output will go to file named "myoutputfile" in your home directory
      ./run_outfile.sh  ~/myoutputfile
    
      # or to default file name "output" in the current directory
      # you can't write into my directory so it will only work if you copied
      # this script over into your subdirectory
      ./run_outfile.sh
  • killmytests.sh: an example script that to kill all your experiments. Always:

    • first pkill -9 the run script (run_outfile.sh in this example)

    • then pkill -9 your program executable (matrixmult in this example)

      This script is very useful if you want to stop all your experiments from running, particularly if you want to do so in the middle of the night, you can schedule a cron job to run this script.

3. Tools/Utilities

3.1. how to find an "idle" machine to use

You want to run your experiments on a machine that is mostly idle so that other processes running on the machine do not interfere with your results. If no one is logged in, the machine may be idle, but someone could be running jobs in a screen session. Also, there may be people logged in, but they are not actively running anything on the system (the forgot to logout when they left the lab, for example).

You can run who see who is logged in, and top -H or htop to see what is running on a machine, and who is logged in, and smarterSSH to find good machines on which to run experiments. In htop if you hit the F6 key you can sort results by different columns. My top and htop help page describes some examples of how to configure top to select what data top shows (htop is similarly configurable). See the man pages for top and htop for more options. Let top run for a minute or so to be certain a machine is idle before firing off a lot of experiments. Also, if you use machines reserved for our class, then you should be able to find an idle one (please share these machines).

Here is some more information about finding idle machines

3.2. screen and tmux

screen and tmux are useful for logging in, starting something running, and then and then logging out while it runs—​the processes you start running in a tmux or screen session stay running when you log out.

Here are the steps to using screen:

  • login and run screen to start a screen session:

    screen
  • start the script you plan to run in this session. I suggest running a bash script of experiments inside a script session (details below), or run a bash script that redirects output of each run to a file.

  • detach from the screen session by typing: Cntl-a d

  • then logout of the computer if you’d like

To re-attach to a screen session, login to the computer on which you ran screen and started some experiments running and detached, and then run:

screen -r

And you can attach and detach as many times as you’d like from the same screen session.

Here is some more information about tmux. For your use of tmux to run experiments, you likely don’t need to configure tmux with multiple panes.

Make sure to exit your screen and tmux sessions on machines when you are done using them.

3.3. script and dos2unix

script captures a terminal session to a file. dos2unix cleans up the resulting file after quit script. See more details here:

Python is a nice language to use to process the resulting typescript file to pull out timing results for related runs, compute average, std dev, spit out results in a nice form.

3.4. bash scripts

Write a bash script to fire off a bunch of experiments. Then just run the bash script and come back later when done. Its good to have some echo commands in your bash script to print out some information about particular runs: this will help with your post-processing scripts to find timing results and compute averages and std dev. With the lab01 starting point code was one example bash script, try that out to see what it does. I also have links to bash programming off my help pages: bash shell programming

When you create a bash script, make sure the file is executable to run it:

vim runexper.sh        # or some other editor
chmod 700 runexper.sh  # set to executable
ls -l
./runexper.sh

Also, try running your bash script a few times before starting it up in screen and coming back later: make sure it is doing what you think it is. You can always comment out the call to gol program in the script to see if it is doing what you want (# is the bash single line comment):

#!/usr/bash

for((n=256; n <= 2049; n=n*2))
do
for ((t=1; t &lt;= 32; t=t*2))
  do
     echo ""
     echo "gol -t $t -n $n -m $n  -k 1000"
#    time ./gol -t $t -n $n -m $n  -k 1000 -x
  done
done

If I run the above bash script I’ll see all the calls to echo print out parameter configs and see if they are what I expect. Then uncomment and run.

In your bash script make sure you run time ./gol …​ to collect runtimes.

3.5. cron

You can add a cron job to run your script at a particular date and time by editing the crontab file on the machine you are running your experiments (ex. on chervil or some of the other CS87-only machines):

  $ ssh chervil
  $ crontab -e

Then add a line like this to run the killmytests.sh script at a specific time and date (at 8pm (20:00), on January (1) 31 :

  0 20 31 1 * /home/newhall/public/cs87/experiment_tools/killmytests.sh

Similarly you can add a cron job to run your experiments at a specific time (here I’m starting them at 4:05 am on February 3):

  5 4 3 2 * /home/newhall/public/cs87/experiment_tools/run_outfile.sh ./mytests

NOTE: please after your cron jobs run, make sure to run crontab -e again to remove them from the crontab file (so that cron doesn’t run them every year on this date at this time until we remove your account).

You can run cal and date to list the current date and time.

4. Let’s Try some stuff out

Let’s try some of these steps together in the example you copied over.

First lets try out screen and script:

  • ssh into a machine, see if idle

  • start screen

  • cd to directory containing gol and bash script

  • start script

  • start bash script to run experiments

  • hit return and type exit (to terminate script…​good practice)

  • detach from screen

  • run top -H just to see if program is running

  • log out of machine

Then later, ssh back in the machine and re-attach to screen session.

On a different machine, create a cron job to run a test script and another to kill my test script and running test programs.

  • run date to get the current time

  • run crontab -e and let’s start the run_outfile.sh in 2 mins and kill one minute later. In this example, let’s say it is Sept. 14st at 1:30pm right now:

    $ crontab -e
    # start run_outfile.sh (with output file mytests in your home directory)
    # at 1:32pm on Sept. 14  (minute:32, hour:13, day:14, month:9)
    32 13 14 9 * /home/tnas/cs87/experiment_tools/run_outfile.sh ~/mytests
    # run killmytests.sh at 1:33pm on Sept. 14
    33 13 14 9 * /home/tnas/cs87/experiment_tools/killmytests.sh

Now, let’s run top -H or htop and see what happens. You can also run ps --user <yourusername> to list all your running processes on a machine.

5. Handy Resources