/home/newhall/public/openMPI_examples/
The first step is to set up ssh'ing without passwords, so that spawing processes on remote hosts doesn't require a password.
Follow these steps: creating ssh keys To test that this works, try ssh'ing into a CS machine and it will ask for your passphrase rather than your password. For example:
ssh carrot Enter pass phrase for .. # enter your passphrase
After setting up your ssh keys, use ssh-agent to avoid having to type in your passphrase when ssh's into machines (and when mpi ssh's to machines in your host file to start your mpi processes). Note: you should run once in the terminal (bash session) before you start one or more mpirun commands from that terminal:
ssh-agent bash ssh-add Enter pass phrase for ... # enter your passphrase
carrot$ ssh butter butter$ Welcome to: butter ...
To run a simple command: ------------------------ mpirun -np 2 --host robin,loon ./excecutableexample: # using mpirun to run the uptime command on two hosts mpirun -np 2 --host robin,loon uptime # using mpirun to run an MPI hello world application on two hosts mpirun -np 2 --host robin,loon ./helloworld
To run a command using a hostfile ---------------------------------- % cat myhosts # a line starting with # is a comment # use slots=X, for machines with X processors basil rosemary nutmeg cinnamon % mpirun -np 4 --hostfile myhosts ./helloworld Hello, world (from basil), I am 0 of 4 Hello, world (from nutmeg), I am 2 of 4 Hello, world (from rosemary), I am 1 of 4 Hello, world (from cinnamon), I am 3 of 4Typically, all MPI processes run the same executable file. However, this is not required. Some programs may be written in a boss-worker style where one process acts as the boss, handing out work and coordinating results from other processes, the workers, who perform the parallel tasks. Other programs may have separate types of tasks that subsets of processes perform. In these cases, a programmer may have separate executable files for each type of proces. To run MPI programs like this, you need to specify how may processes to spawn for each exectuable file using multiple -np command line options, one per executable file.
To run a boss/worker program (one process runs the boss executable, others run the worker executable) ------------------------------------------------------------------------ % cat myapp # boss is the name of the boss executable, worker is worker -np 1 ./boss -np 6 ./worker % mpirun --hostfile myhosts --app myapp boss: allocating block (0, 0) - (19, 19) to process 1 boss: allocating block (20, 0) - (39, 19) to process 2 boss: allocating block (40, 0) - (59, 19) to process 3 ... boss: allocating block (500, 500) - (511, 511) to process 2 boss: done.
cat hostfile robin sparrow lark loon cat hostfile1 robin slots=1 sparrow slots=1 lark slots=1 loon slots=1If robin has 4 cores, then:
# spawns 4 processes on robin mpirun -np 4 --hostfile hostfile ./helloworld # spawns 4 processes, 1 each on robin, sparrow, lark and loon: mpirun -np 4 --hostfile hostfile1 ./helloworld
cat /proc/cpuinfo # information on every core on machine cat /proc/meminfo # information about total RAM size (and current use) lscpu # summary information about processor lsmem # summary information about memoryAlso, the following page of the CS help pages lists summary information about machines in our system: CS lab machine info
In /usr/swat/local/db are files that list all the host names for machines in different labs. You can use these to create your MPI host files. For example:
cp /usr/swat/db/hosts.bookstore hostfileThen edit hostfile to remove any hosts you don't want to include.
The individual files listing machines in our 4 labs are:
/usr/swat/db/hosts.256 /usr/swat/db/hosts.bookstore /usr/swat/db/hosts.mainlab /usr/swat/db/hosts.overflow
For example:
#to generate a hostfile of 10 good machines on our network: autoMPIgen -n 10 hostfile # to generate a hostfile of 10 good machines and include slot=4 with each entry: autoMPIgen -n 10 -s 4 hostfile # just list information about the top 40 machines (this doesn't fill hostfile) autoMPIgen -n 40 -v -i hostfile # you can also use smarterSSH to list this same information: smarterSSH -n 40 -v -i
Run autoMPIgen -h to see its command line arguments for furtner configuration options.
There is also more information about autoMPIgen (and smarterSSH) off
the PeerMon page
As a general practice when debugging parallel programs, debug runs of your program with the fewest number of processes possible (2, if you can).
To use valgrind, run a command like the following:
mpirun -np 2 --hostfile hostfile valgrind ./mpiprogThis example will spawn two MPI processes, running mpiprog in valgrind. This means both processes will print valgrind errors to the terminal.
To use gdb, first create a hostfile that include only the machine on which you are logged in (gdb in MPI will create xterm windows that would need to be Xforwarded from a remote machine otherwise). Then, run a command like the following:
mpirun -np 2 --hostfile hostfile xterm -e gdb ./mpiprogSince this spawn two terminal windows, each running a gdb session for one of the two MPI processes in this run. In each gdb session, you can then set breakpoints, choose the run command to start each running, and then use other gdb commands to examine the runtime state of thes MPI processes.
If your xterm settings with gdb output font highlighting are hard to read, you can change your default xterm settings in your .Xdefaults file and then reset them to your changes by running xrdb -merge ~/.Xdefaults. For example:
# open your .Xdefults file in an editor (vim, for example): vim ~/.Xdefaults ! change background and forground settings in this file to: xterm*background: black xterm*foreground: white # then run this command to apply them: xrdb -merge ~/.Xdefaults
Here are some links to my gdb guide and
valgrind guide. Also, Chapter 3 of Dive into Systems contains a more verbose version of much of this content.
#!/bin/bash if [ "$#" -ne 1 ] then echo "usage ./checkhosts.sh hostfilename" exit 1 fi for i in `cat $1` do echo "checking $i" ssh $i uptime doneThen run on a hostfile to check if hosts are reachable
# first make sure the script file is executable ls -l checkup.sh # and set permissions if not chmod 700 checkup.sh # run to check if hosts in a hostfile are reachable ./checkup.sh hostfileYou do not normally have to do this, but if all the nodes in a hostfile are reachable but you are having trouble re-running mpirun, you can try running orte-clean to clean-up any processes and files left over from a previous run that could be interfering with subsequent runs.
# try this first: # make sure all your hosts in hostfile are reachable # and if not, take them out of your host file ./checkup.sh hostfile # try this second: # clean up MPI job state on node from which you ran mpirun orte-clean --verbose # try this last: # clean up MPI job state on all nodes in a hostfile mpirun --hostfile hostfile orte-clean --verbose
------------------------------------------------------ A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host:You can ignore this, as this is mpirun looking for infiniband nics and not finding them on our system (and using Ethernet instead). You can also get rid of these warnings when running mpirun by setting an MCA parameter btl_base_warn_component_unused to 0: One way to do this is in the mpirun command lineAnother transport will be used instead, although this may result in lower performance. NOTE: You can disable this warning by setting the MCA parameter btl_base_warn_component_unused to 0. ------------------------------------------------------
mpirun --mca btl_base_warn_component_unused 0 -np 4 --hostfile ~/hostfile ./mesh_w_buffersAnother way is to add a shell environment variable with this setting and run mpirun without this command line option:
setenv OMPI_MCA_btl_base_warn_component_unused 0 mpirun -np 4 --hostfile hostfile ./mesh_w_buffersFinally, you could add this environment variable into your .bashrc file, and it will be always added to your bash env variables, and then you can just run without this warning. Add this to your ~/.bashrc file:
export OMPI_MCA_btl_base_warn_component_unused=0Once you set this in your .bashrc, in any new bash shell, you can run mpirun and no longer see this warning:
mpirun -np 4 --hostfile hostfile ./mesh_w_buffersIn an older shell, one that predates your change, you can run source ~/.bashrc to update its environment variables from the new .bashrc to get this.
See the MPI FAQ linked to from here for more info about MCA settings: Links to MPI references and tutorials