Submitting a job

Queuing system

[] Cray XC30 job queue is managed by a scheduler system．

Please check the parallel servers user manual．

Queue classes

[]

Class	No. of Nodes	No. of Cores	Expected execution duration	No. of Concurrent Jobs	No. of Concurrent Jobs/User	Priority
TINY	1-4node	~96core	30min	Unlimited	Unlimited	1
SMALL	4-16node	~384core	48hour	16job	4job	3
MEDIUM	16-32node	~768core	12hour	6job	3job	4
LARGE	32-128node	~3072core	12hour	2job	1job	2
XLARGE	128-356node	~8544core	12hour	1job	1job	7
LONG-S	1-8node	~192core	1week	4job	1job	3
LONG-M	8-32node	~768core	96hour	2job	1job	4
LONG-L	～360node	(Application dependent)	-	-	-

Node / Core = Number of available nodes / Number of cores
Execution time = the maximum time that a job submitted to the queue is in execution state
Number of concurrently executed jobs = Number of jobs in the same queue that can be executed simultaneously
Number of concurrently executed jobs / user = the number of jobs from the same user in the same queue that can be executed at the same time
Number of simultaneous inputs / user = the number of jobs that one user can simultaneously have in the same queue (including running jobs)

■ 1 Node is 2CPU that is 24 cores．
■ Job allocation is done per-node. That is, single job of a single thread occupies 1 node = 24 cores.．
■ 4 nodes are exclusively allocated for the TINY queue. Thus the XLARGE queue is limitedd to 8544 Cores. The LARGE-L class is unlimited.

＊＊ For more details about job classes please check the link

Checking jobs (XC30 commands)

[] XC30 uses the PBS standard qstat command, the number of nodes in use may not be displayed correctly.

The following command can be used to check the currently running jobs.

nqstat

Job script example

[]

MPI program

#!/bin/csh
#PBS -q <QUEUE>
#PBS -j oe
#PBS -l mppwidth=48　<- No. of MPI processes to use　<required>
#PBS -l mppnppn=24 <- No. of processes per node <required: max. 24>
#PBS -l mppdepth=1　 <- Should be 1 for MPI programs

#PBS -N BATCH-JOB　　<- Arbitary job name

cd $PBS_O_WORKDIR　<- To move to the working directory

aprun -n 48 -N 24 -d 1 ./a.out

XC30 uses the aprun command to execute programs．

aprun [-n num/-N num/ -d num ] <ELF>

-n: no. of MPI processes　（according to the mppwidth of the PBS）
-N: no. of MPI processes per node　（according to the mppnppn of the PBS）
-d: no. of threads per MPI process　（according to the mppdepth of the PBS）

MPI/OpenMP hybrid

#!/bin/csh
#PBS -l mppwidth=4 <- number of MPI processes to run. 2 process x 2 node = 4, in this case two node utilization
#PBS -l mppnppn=2 <- two processes within a node. 12 threads x 2 process = 24, using all 24 cores in the node.
#PBS -l mppdepth=12　<- 12 threads using 12 cores in the CPU

$PBS_O_WORKDIR　<- to move to the working directory

aprun -n 4 -N 2 -d 12 ./a.out

Running multuple scripts

In XC30, the number of jobs that one user can have simultaneously has been limited since 2013/7. To execute more , you need to list multiple executable files in the job script.

An example is as follows.

* The execution results of some programs may be overwritten depending on how you write the script. Be careful about the storage location of the results.

#!/bin/csh
#PBS -q SMALL
#PBS -j oe
#PBS -l mppwidth=48　<- no. of MPI processes　<required>
#PBS -l mppnppn=24 <- no. of processes per node <required:max. 24>
#PBS -l mppdepth=1　 <- 1 for MPI programs

#PBS -N BATCH-JOB　　<- Job arbitary name

cd /work/xxxxxxxx/job1
aprun -n 48 -N 24 -d 1 ./job1.exe

cd /work/xxxxxxxxx/job2
aprun -n 48 -N 24 -d 1 ./job2.exe

cd /work/xxxxxxxx/job3
aprun -n 48 -N 24 -d 1 ./job3.exe

Application queue

[] If you can apply in advance and obtain consent, you can occupy the whole or some nodes for a limited period.

A period-limited queue is called an application queue．

Requesting an application queue

[]

Apply at least 2 week ahead for LONG-L queue by contacting mpc-admin。
mpc-admin group will inform the applicant with the schedule to use the LONG-L queue and stop submitting jobs according to that schedule.
The applicant submits to the LONG-L queue at the specified schedule when the resources are available.
LONG-L queue will finish and the results should be reported.