SLURM Queue Manager

SLURM Job Manager

Chimera uses SLURM to manage its queues. Please see the SLURM Quick Start User Guide for basic usage instruction. The man pages for the individual SLURM commands also contain more detailed information. This page summarizes some of the most commonly-used commands and also describes Chimera-specific considerations.

Partitions and Jobs

Every job handled by SLURM is inserted into a partition. Currently, there are at least three partitions on Chimera: all, 64g, and 128g. 64g and 128g are for nodes with 64GB and 128GB of memory, respectively. To see what partitions exist and get basic information about their status, use sinfo:

-bash-3.2$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all          up   infinite     64  alloc node[00-63]
64g*         up   infinite     56  alloc node[08-63]
128g         up   infinite      8  alloc node[00-07]
test         up   infinite      2   idle node[64-65]
-bash-3.2$

To see a list of all jobs currently queued or running, use squeue:

-bash-3.2$ squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
     26       all   sbatch   siegel   R    1:06:37     64 node[00-63]
-bash-3.2$

SLURM epilog

After a job exits a node, an epilog script is run which will kill all processes for users who are not authorized to be running on that node. This has two useful effects: to clean up jobs which may declare themselves done without actually killing all sub-processes, and to terminate programs started on the node though means other than the SLURM manager.

Partition priorities and preemption

In order to accommodate large jobs as well as long-running jobs, Chimera has the following partitions. Each partition has associated priorities at different times. Higher-priority jobs will preempt lower priority jobs causing them to be suspended until the higher-priority job completes. Note that priorities depend on the time of day!

Note: currently, only the first three partitions are implemented!

PartitionDescriptionNormal PriorityEvening PriorityWeekend PriorityMax nodesMax Run Time
allAll nodes for general use101010641 week
64gNodes with 64GB of memory for general use101010561 week
128gNodes with 128GB of memory for general use10101081 week
fullFor jobs needing the whole machine55206464 hours*
halfFor jobs needing half the nodes520183216 hours*
longFor long-running jobs555812 weeks
expressFor short turn-around jobs151515215 minutes

(*) Jobs running in the "full" or "half" queues will be killed at 7AM on the next business day

MPI

As indicated in [Quick Start], there are several versions of MPI installed, each of which has been built with different compilers. SLURM can be used to submit MPI jobs directly, but will require some additional configuration. Please see the MPI Use Guide on the main SLURM site for full instructions.

Note: slurm.conf has MpiDefault set to "none". This is the correct (though counter-intuitive) setting for MVAPICH2 and OpenMPI.

MVAPICH2

Choose your build and execution environment using modules.

MVAPICH2 jobs may be started directly using the srun command, but the program must be linked with the SLURM PMI library:

mpicc -L/usr/local/lib -lpmi ...
srun -n6 a.out

The [modules] environment will include a MPI*_PROFILE variables to use the SLURM profile, which will cause the SLURM PMI to be linked.

OpenMPI

Choose your build and execution environment with modules.

OpenMPI (1.4.2) uses SLURM to allocate resources. The job is then run using mpirun.

$ salloc -n4 sh    # allocates 4 processors and spawns shell for job
> mpirun a.out
> exit             # exits shell spawned by initial salloc command

You can also do this in one step if there are no other steps that need be taken within the allocation: $ salloc -n4 mpirun a.out