For the Lab 4 assignment you will be using some XSEDE resources. In order to use these, you need to first create an account on XSEDE. It can take up to a week to enable this account, so please follow the "Get an XSEDE Account" steps today:
1. Get an XSEDE Accout
To set up your XSEDE Account:
-
choose "Create Account" button to request a new account:
Organization: Swarthmore Department: Computer Science Registration Key: pick the 6 first chars your swarthmore user name
-
When you have a choice to select a user name, pick your swarthmore user name (ex. tnewhal1 is mine). If your Swat user name is already in use, pick a different one and let me know what you picked.
It will take a few days, and up to 1 week, for your account to be activated.
2. Setting up Logging into Bridges-2
-
go to https://portal.xsede.org/ and login with your user name, request a password.
-
Select My XSEDE→Profile to see the class' XSEDE Resources and your usage information.
Healthy
means you can log into that XSEDE system. -
I don’t think you need to do this step, but if you have trouble logging into Bridges-2, or you want to add multi-factor authentication, do the following: Choose Enroll in Duo in upper right Add a device. If you have a phone you carry in your pocket, use that. If not contact Andrew Ruether in ITS and he will give you a device to use and help you register it.
Here are more directions XSEDE MFA with Duo
3. Log into Bridges-2
3.1. The first time you log into Bridges-2
Detailed Bridges-2 connection instructions are here: Bridges-2 User Guide.
-
The first time you log into Bridges-2, you need to set your PSC password through a web interface https://apr.psc.edu/. Enter your xsede user name and your email address. You will receive a security code and then can set or change your PSC password. This password works accross all PSC systems, including Bridges-2.
-
Once you have your PSC password, you can log into a Bridges-2 using ssh and your XSEDE login name. For example, if my user name is
tnewhal1
I’d do this:ssh tnewhal1@bridges2.psc.edu
-
Optional but recommended: You can fill out a webform to submit a public ssh key to the PSC SSH Key Manager for subsequent logins (you don’t need to do this, but you can then login Bridges-2 using your passcode vs. your PSC password).
Upload a public key from your cs account to bridges2.psc.edu (your public key on our system is in
~/.ssh/id_rsa.pub
). Just cut and paste it into the PSC SSH Key Manager web form "Submit a new key".In our git guide is some information about generating keys on our system: generating keys
3.2. logging directly Bridges-2 subsequent times
We have allocations on Bridges-2 Regular Memory Nodes and CPU Nodes.
After creating your PSC login (and optionally adding your public key(s)), you can directly ssh or scp to bridges2.psc.edu from a cs machine:
ssh tnewhal1@bridges2.psc.edu
Note: if you uploaded your CS public key, use its passphrase, otherwise use your PSC password.
4. Using Bridges-2 and submitting jobs
4.1. Copying files
You can use scp to copy files between your cs account and bridges2.psc.edu. For example, from bridges2.psc.edu I can copy over a file or a whole subdirectory (and just change the source and destination to copy from bridges2.psc.edu to CS):
# copy over foo.c
scp newhall@cs.swarthmore.edu:/home/newhall/public/foo.c .
# WARNING: this does a recursive copy of all contents under the specified
# directory (my mpi_examples directory in this example):
scp -R newhall@cs.swarthmore.edu:/home/newhall/public/mpi_examples .
You can also create a single tar file of a set of files to easily copy over and then untar it once copied over. Here is my documentation about using tar. You can also look at the tar man page for more information.
4.2. Software, Modules, Compiling
To use a specific SW environment, you need to load a module on the system that defines it, before you compile and run.
You can list all avialable MPI implementations using the module avail
command:
module avail mpi
Then you can add an OpenMPI/gcc version. On our systems we are using openmpi 3.1.6 with gcc 10.2.0, so use that module on Bridges-2 too. To load the module:
module load openmpi/3.1.6-gcc10.2.0
Then use mpicc
as the compiler (this should be in Makefile).
4.3. Running Jobs
You will run experiments as batch jobs. Start by submitting one test of your slurm script to run a short run of your program, and then try a few longer ones, before submitting a longer job. We only have so many compute hours allocated on this system. We all need to share these. You want to avoid running code with deadlocks that just uses comput hours doing nothing (so test some large sizes on strilka first). You also want to limit the total runtime of your slurm script to not use up lots of resources. You will use some for this, which is the point, but becareful to give reasonable time and space limits in your slurm script for the few large experiments you run.
Bridges-2 runs the SLURM resource manager to submit batch jobs.
squeue # list all jobs in the job queue (there will be lots)
sbatch job.mpi # to submit a batch job from a slurm script
# this will give you a jobId for your job
squeue -u yourusername # list all jobs in the job queue that you own
scancel jobId # to kill a queued job
man slurm # the man page for slurm
4.3.1. queues/partitions
Bridges-2 is partitioned into different types of nodes.
For MPI use the RM
(regular memory) partition for your few larger, longer
test runs.
If you use Bridges-2 for your course project, it has gpu partitions that you may want to use.
4.3.2. slurm job script
Here are some sample batch scripts for using Bridges-2.
The slurm job script specifies information about the job you are submitting. Some of these include the number of nodes, mpi processes, and an estimate at your job’s runtime.
/share/apps/examples/ # example job scripts (see mpi examples)
4.3.3. File Spaces
If you use Bridges-2 for data intensive computing that requires large input or output file storage, use the file storage on Ocean to store input and result files.
The Bridges-2 User’s Guide has a lot more information and examples. Click on 'File Spaces' in the index here Bridges-2 User Guide for more information about using Ocean.
5. Hello World Example
I have a very simple example MPI program and slurm run script for submitting to the RM queue on Bridges-2. You can try it out by doing the following:
# from bridges2.psc.edu, copy over my hello world example, untar and make it
scp /home/newhall/public/XSEDE_MPI.tar .
tar xvf XSEDE_MPI.tar
cd XSEDE_MPI
make
vi hello.mpi # change the path name to helloworld to your path
# (in mpi_run_rsh command line change newhall to your user name)
# submit your job
sbatch hello.mpi
# check its status
squeue -u yourusername
# after it has run its output is in a file (vi, cat, less, ... to view)
less helloworldJOBID.out
The hello.mpi
is an example slurm runscript that you can use as a
starting point for other mpi applications you run. It submits the job
to the RM queue, which is the one you want to use for testing small
jobs before then submitting longer experiment runs to this queue.
6. Useful Functions and Resources
-
Try out my HelloWorld MPI Example (in Section 5) on Bridges-2.
-
See the Bridges-2 User’s Guide for lots more information and examples. In particular:
-
Bridget-2 Partitions: about the different partitions (queue). For MPI use the RM partition.
-
Programming Environment: compilers and SW available, how to load modules with the SW you want to use. See also the module package commands.
module avail # lists software available on bridges2.psc.edu module load module_name # load software module on bridges2.psc.edu
-
Running Jobs: look at running batch jobs and sample batch sripts
-
-
The Xsede portal: https://portal.xsede.org/
-
You can copy over my MPI example programs in
/home/newhall/public/openMPI_examples/
to try out too. Follow the batch script example inhelloworld.mpi
(from Section 5) to create batch scritps to submit usingsbatch
on Bridges-2. -
slurm users guide (probably way more than you need)
-
All the systems are listed here XSEDE resources. We currently have allocations on Bridges-2 RM and GPU. Use the RM partition for MPI programs.