For the 4 assignment you will be using some ACCESS resources. In order to use these, you need to first create an account on ACCESS. It can take up to a week to enable this account, so please follow the "Get an ACCESS Account" steps today:

Get an ACCESS Accout

To set up your ACCESS Account (ACCESS home page:u https://access-ci.org/)

  1. go to https://operations.access-ci.org/identity/new-user

    1. Choose "Register with an existing identity" option (and choose Swarthmore from the menu).

    2. You should get the Swarthmore login, page. Enter your Swarthmore user name and password.

    3. Access will assign you an Access user name, email Tia your assigned user name.

  2. Will need to add you to the allocation for CS87 so you can use ACCESS systems.

Seeing ACCESS resources

Go to https://allocations.access-ci.org/resources to list all ACCESS systems. Click on a system to get information about how to use it and how to log into it by clicking on its user guide.

For Lab 4 experiments after break, we are going to use Purdue Anvil.

Its user guide is here: Anvil User Guide.

Seeing your Allocations

Log into your ACCESS Account using CILogon. . go to https://access-ci.org/ . choose "login" button and choose to log in using your CILogon.

Under MyAccess in upper right, choose My Allocations

Logging in

Logging into each ACCESS system is different. See the user guides for individual systems for directions.

Logging into Anvil:

Your anvil user name should be x-accessusername, where accessusername will likely be your swarthmore user name (unless if there is already an ACCESS user with your name).

Here are the login instructions for Anvil: Anvil Login Instructions (off the Anvil User Guide Getting Started: Accessing the System→Logging In).

Follow the With SSH instructions.

Initail login to Anvil

To initially log into Anvil, you need to first upload public ssh keys from systems from which you will ssh into. First log into the Open OnDemand Interface (click the link to Open OnDemand Interface on the login instructions page): login at ondemand.anvil.rcac.purdue.edu/

logging directly Anvil subsequent times

After uploading ssh keys, you can directly log into Anvil using ssh:

$ ssh x-anvil-username@anvil.rcac.purdue.edu

Using Anvil and submitting jobs

Copying files

You can use scp to copy files between your cs account and anvil.rcac.purdue.edu. For example, from anvil.rcac.purdue.edu I can copy over a file or a whole subdirectory (and just change the source and destination to copy from anvil.rcac.purdue.edu to CS):

# copy over a file (this is a file with mpi helloworld and
# example slurm bash scripts to submit to run on Purdue's Anvil Cluster
scp newhall@cs.swarthmore.edu:/home/newhall/public/ACCESS_MPI.tar .

# WARNING: this does a recursive copy of all contents under the specified
# directory (my openMPI_examples directory in this example):
scp -R newhall@cs.swarthmore.edu:/home/newhall/public/openMPI_examples .

You can also create a single tar file of a set of files to easily copy over and then untar it once copied over. Here is my documentation about using tar. You can also look at the tar man page for more information.

Suggested additions to your .bashrc

You may want to add some aliases to your .bashrc file on anvil for the rm and cp command to avoid accidentally removing or overwriting a file. To do this open your .bashrc file in vim and add the following at the bottom:

    alias rm='rm -i'
    alias cp='cp -i'

Then run source to get bash to read its .bashrc again to get these new aliases:

$ source ~/.bashrc

Software, Modules, Compiling

You can list all avialable MPI implementations using the module avail command:

module avail mpi

Then you can add an OpenMPI/gcc version:

module load openmpi/4.0.6

module list

Then use mpicc as the compiler (this should be in Makefile).

Running Jobs

You will run experiments as batch jobs. Start by submitting one test of your slurm script to run a short run of your program, and then try a few longer ones, before submitting a longer job. We only have so many compute hours allocated on this system. We all need to share these. You want to avoid running code with deadlocks that just uses comput hours doing nothing (so test some large sizes on Strilka first). You also want to limit the total runtime of your slurm script to not use up lots of resources. You will use some for this, which is the point, but becareful to give reasonable time and space limits in your slurm script for the few large experiments you run.

Anvil runs the SLURM resource manager to submit batch jobs.

squeue            # list all jobs in the job queue (there will be lots)
sbatch  job.sb    # to submit a batch job from a slurm script (named job.sb)
                  # this will give you a jobId  for your job

squeue -u yourusername # list all jobs in the job queue that you own
scancel jobId     # to kill a queued job

man slurm         # the man page for slurm

I’ll give you some example slurm scripts for Strelka and Anvil.

Anvil queues/partitions

Anvil is partitioned into different types of nodes. For MPI use the wide (regular memory) partition for your few larger, longer test runs. This is an expensive queue, so before submitting a few runs on this one, you should first try a few runs on the debug queue for some smaller runs to make sure your slurm script works and your program runs okay on Anvil.

Here is more information about: Anvil Partitions

If you use Anvil for your course project, it has gpu partitions that you may want to use.

slurm job script

sample batch script for running an MPI program on Anvil.

The slurm job script specifies information about the job you are submitting. Some of these include the number of nodes, mpi processes, and an estimate at your job’s runtime.

My helloworld example below includes example slurm scripts.

File Spaces

If you use ACCESS systems for data intensive computing that requires large input or output file storage, you should look at the system’s available file systems, and use the one that is best for your needs. Many have parallel file systems like Luster and file systems specifically for big data. Many systems have some available large data sets.

Here is more information about file systems on Anvil.

Hello World Example

I have a very simple example MPI program and some example run scripts for submitting to the debug and to the wide queue on Anvil. You can try it out by doing the following:

# from anvil.rcac.purdue.edu, copy over my hello world example, untar and make it
your_cs_username@cslab.cs.swarthmore.edu:/home/newhall/public/ACCESS_MPI.tar .
tar xvf ACCESS_MPI.tar
cd ACCESS_MPI

# load openmpi and make

make

vi hello.debug.sb   # change the path name to helloworld to your path
vi hello.debug2.sb  # change the path name to helloworld to your path
vi hello.wide.sb    # change the path name to helloworld to your path

# submit your job using sbatch
sbatch hello.debug.sb

# check its status
squeue -u yourusername

# after it has run its output is in a file (vi, cat, less, ... to view)
less helloworldJOBID.out
  • hello.debug.sb is an example slurm runscript that you can use as a starting point for other mpi applications you run. It submits the job to the RM queue, which is the one you want to use for testing small jobs before then submitting longer experiment runs to this queue.

  • hello.debug.even.sb a very similar slurm runscript but it uses --ntasks-per-node (instead of --ntasks) to evenly distribute the 16 processes over the 2 nodes.

  • hello.wide.sb is another example runscript that runs on the wide partition. The wide partition is one to use after debug to run larger sized (across more nodes) runs.

Useful Functions and Resources

  • Try out my HelloWorld MPI Example (in Hello World Example) on Anvil.

  • See the Anvil User’s Guide for lots more information and examples.

  • To list and load available modules on a system:

    module avail               # this lists software available on Anvil
    
    module list                # list currently loaded modules
    
    module load  module_name   # load software module on Anvil
  • You can copy over my MPI example programs in /home/newhall/public/openMPI_examples/ to try out too. Follow the slurm script example hello.debug.sb (from Hello World Example) to create slurm scritps to run these using sbatch on Anvil.

    The helloworld program from here my openMPI_examples is also available here: /home/newhall/public/ACCESS_MPI.tar. It includes a few sample slurm bash scripts for running on Anvil.

  • slurm users guide (probably way more than you need)

  • ACCESS homepage

    All the systems are listed here: ACCESS resources. currently have allocations on Anvil RM and GPU. Use the RM partition for MPI programs.

  • Anvil Uer Guide

  • Anvil Queues