menu

Submitting a job

Queuing system

[] Cray XC30 job queue is managed by a scheduler system.

Please check the parallel servers user manual

Queue classes

[]

Class No. of Nodes No. of Cores  Expected execution duration No. of Concurrent Jobs No. of Concurrent Jobs/User Priority
TINY 1-4node ~96core 30min Unlimited Unlimited 1
SMALL 4-16node ~384core 48hour 16job 4job 3
MEDIUM 16-32node ~768core 12hour 6job 3job 4
LARGE 32-128node ~3072core 12hour 2job 1job 2
XLARGE 128-356node ~8544core 12hour 1job 1job 7
LONG-S 1-8node ~192core 1week 4job 1job 3
LONG-M 8-32node ~768core 96hour 2job 1job 4
LONG-L ~360node (Application dependent) - - -  
  • Node / Core = Number of available nodes / Number of cores
  • Execution time = the maximum time that a job submitted to the queue is in execution state
  • Number of concurrently executed jobs = Number of jobs in the same queue that can be executed simultaneously
  • Number of concurrently executed jobs / user = the number of jobs from the same user in the same queue that can be executed at the same time
  • Number of simultaneous inputs / user = the number of jobs that one user can simultaneously have in the same queue (including running jobs)

■ 1 Node is 2CPU that is 24 cores.
■ Job allocation is done per-node. That is, single job of a single thread occupies 1 node = 24 cores.
■ 4 nodes are exclusively allocated for the TINY queue. Thus the XLARGE queue is limitedd to 8544 Cores. The LARGE-L class is unlimited.

** For more details about job classes please check the link

Checking jobs (XC30 commands)

[] XC30 uses the PBS standard qstat command, the number of nodes in use may not be displayed correctly.

The following command can be used to check the currently running jobs.


nqstat

Job script example

[]

MPI program

#!/bin/csh
#PBS -q <QUEUE>
#PBS -j oe
#PBS -l mppwidth=48 <- No. of MPI processes to use <required>
#PBS -l mppnppn=24   <- No. of processes per node <required: max. 24>
#PBS -l mppdepth=1  <- Should be 1 for MPI programs

#PBS -N BATCH-JOB  <- Arbitary job name

cd $PBS_O_WORKDIR <- To move to the working directory

aprun -n 48 -N 24 -d 1  ./a.out

XC30 uses the aprun command to execute programs.

aprun [-n num/-N num/ -d num ] <ELF>

-n: no. of MPI processes (according to the mppwidth of the PBS)
-N: no. of MPI processes per node (according to the mppnppn of the PBS)
-d: no. of threads per MPI process (according to the mppdepth of the PBS)



MPI/OpenMP hybrid

#!/bin/csh
#PBS -l mppwidth=4 <- number of MPI processes to run. 2 process x 2 node = 4, in this case two node utilization
#PBS -l mppnppn=2 <- two processes within a node. 12 threads x 2 process = 24, using all 24 cores in the node.
#PBS -l mppdepth=12 <- 12 threads using 12 cores in the CPU

$PBS_O_WORKDIR <- to move to the working directory

aprun -n 4 -N 2 -d 12 ./a.out

 

Running multuple scripts

In XC30, the number of jobs that one user can have simultaneously has been limited since 2013/7. To execute more , you need to list multiple executable files in the job script.

An example is as follows.

* The execution results of some programs may be overwritten depending on how you write the script. Be careful about the storage location of the results.

 

#!/bin/csh
#PBS -q SMALL
#PBS -j oe
#PBS -l mppwidth=48 <- no. of MPI processes <required>
#PBS -l mppnppn=24   <- no. of processes per node <required:max. 24>
#PBS -l mppdepth=1  <- 1 for MPI programs

#PBS -N BATCH-JOB  <- Job arbitary name

cd /work/xxxxxxxx/job1
aprun -n 48 -N 24 -d 1  ./job1.exe

cd /work/xxxxxxxxx/job2
aprun -n 48 -N 24 -d 1  ./job2.exe

cd /work/xxxxxxxx/job3
aprun -n 48 -N 24 -d 1  ./job3.exe

Application queue

[] If you can apply in advance and obtain consent, you can occupy the whole or some nodes for a limited period.

A period-limited queue is called an application queue

Requesting an application queue

[]

  1. Apply at least 2 week ahead for LONG-L queue by contacting mpc-admin
  2. mpc-admin group will inform the applicant with the schedule to use the LONG-L queue and stop submitting jobs according to that schedule.
  3. The applicant submits to the LONG-L queue at the specified schedule when the resources are available.
  4. LONG-L queue will finish and the results should be reported.