Quick Start

Chimera Quick Start Guide

1. Get set up. If you don't have a Chimera account, or have not set up your environment to use [modules], do so following the instructions on the [Getting Started] page.

2. Log on to Chimera, if you have not already done so, using your ECE/CIS user name and password:

alban:~ siegel$ ssh -l siegel chimera.cis.udel.edu
Password: 
Last login: Thu Feb  3 10:24:02 2011 from 85.125.52.197
-bash-3.2$

Note: from off-campus, you might have to first log on to one of the gateway machine, e.g., stimpy.cis.udel.edu.

3. Select your compiler and MPI implementation. Here you have some choices. There are 3 compilers (GCC, PGI, and Intel) and 2 MPI implementations (OpenMPI and MVAPICH2). Any compiler may be used with either MPI implementation for a total of 3*2=6 possible ways to compile and execute an MPI program. By default, accounts are configured to select GCC and MVAPICH2. If you are happy with that choice, you can proceed to the next step. Otherwise, use the [modules] package to select among the 6 choices, by issuing a command such as

-bash-3.2$ module load mvapich2/gcc
-bash-3.2$

4. Get the source code. Prepare the code however you want, by scp from somewhere else, by checking out with svn, editing with emacs, .... For this example we use a simple "Hello, world" C/MPI program hello.c:

#include<stdio.h>
#include<mpi.h>

int main(int argc,char *argv[]) {
  int rank;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  printf("Hello from process %d.\n", rank);
  fflush(stdout);
  MPI_Finalize();
  return 0;
}

5. Compile your program as follows:

-bash-3.2$ mpicc -o hello hello.c
-bash-3.2$ 

(Note: it is no longer necessary to include -L/usr/local/slurm/lib -lpmi in the command because this is now done automatically. Linking with the PMI library is necessary when using MVAPICH2 with SLURM, which is used to launch jobs on Chimera.) You can add to this command any other options you would normally use with cc.

6. Run your program. Say you want to execute your program using 6 MPI processes. The simplest way is something like the following:

-bash-3.2$ srun -p 64g -n 6 hello
Hello from process 0.
Hello from process 1.
Hello from process 2.
Hello from process 3.
Hello from process 4.
Hello from process 5.
-bash-3.2$

This will queue your job for execution in the "64g" queue and run your program as soon as the requested resources become available. (The 64g queue is a queue for the 56 Chimera nodes that have 64 Gigabytes of memory.) The output will go to stdout (your terminal). Your prompt will not return until the program has terminated. This is OK for very small and short-running programs, but for anything substantial you probably want to run your program in batch mode. This is accomplished by something like this:

-bash-3.2$ sbatch -p 64g -n 200 --wrap="srun -n 200 hello"
Submitted batch job 24
-bash-3.2$

SLURM returns to you a job ID number, here 24, and the prompt will return immediately. The output from your job will go to a file called slurm-24.out. Next, you can

7. Check on the status of your job. This is done with the squeue command:

-bash-3.2$ squeue
JOBID PARTITION     NAME     USER  ST   TIME  NODES NODELIST(REASON)
   24       64g   sbatch   siegel   R   0:03      5 node[08-12]
-bash-3.2$

This gives the status of all jobs running or queued on Chimera. The column labeled "ST" gives the status of the job; "R" indicates the job is currently running. The job has been allocated 5 nodes: nodes 8 through 12, inclusive. As each node has 48 cores, this is a total of 240 cores. SLURM has mapped each of the 200 MPI processes to one core, and 40 of the cores are going unused. Using "squeue -l" will give more detail. See [SLURM|Slurm Queue Manager] for further details on using SLURM, options to control how processes are mapped to nodes and cores, the various queues available on Chimera, and so on. The SLURM man pages provide even more information; for example, type man srun or man sbatch for detailed documentation on those commands.

You can always look at the output file to check up on things:

-bash-3.2$ cat slurm-24.out
Hello from process 0.
Hello from process 108.
Hello from process 128.
Hello from process 40.
Hello from process 160.
...