Submitting a job
Queuing system
[] Cray XC30 job queue is managed by a scheduler system.
Please check the parallel servers user manual.
Queue classes
[]
Class | No. of Nodes | No. of Cores | Expected execution duration | No. of Concurrent Jobs | No. of Concurrent Jobs/User | Priority |
---|---|---|---|---|---|---|
TINY | 1-4node | ~96core | 30min | Unlimited | Unlimited | 1 |
SMALL | 4-16node | ~384core | 48hour | 16job | 4job | 3 |
MEDIUM | 16-32node | ~768core | 12hour | 6job | 3job | 4 |
LARGE | 32-128node | ~3072core | 12hour | 2job | 1job | 2 |
XLARGE | 128-356node | ~8544core | 12hour | 1job | 1job | 7 |
LONG-S | 1-8node | ~192core | 1week | 4job | 1job | 3 |
LONG-M | 8-32node | ~768core | 96hour | 2job | 1job | 4 |
LONG-L | ~360node | (Application dependent) | - | - | - |
- Node / Core = Number of available nodes / Number of cores
- Execution time = the maximum time that a job submitted to the queue is in execution state
- Number of concurrently executed jobs = Number of jobs in the same queue that can be executed simultaneously
- Number of concurrently executed jobs / user = the number of jobs from the same user in the same queue that can be executed at the same time
- Number of simultaneous inputs / user = the number of jobs that one user can simultaneously have in the same queue (including running jobs)
■ 1 Node is 2CPU that is 24 cores.
■ Job allocation is done per-node. That is, single job of a single thread occupies 1 node = 24 cores..
■ 4 nodes are exclusively allocated for the TINY queue. Thus the XLARGE queue is limitedd to 8544 Cores. The LARGE-L class is unlimited.
** For more details about job classes please check the link
Checking jobs (XC30 commands)
[] XC30 uses the PBS standard qstat command, the number of nodes in use may not be displayed correctly.
The following command can be used to check the currently running jobs.
nqstat
Job script example
[]
MPI program
#!/bin/csh
#PBS -q <QUEUE>
#PBS -j oe
#PBS -l mppwidth=48 <- No. of MPI processes to use <required>
#PBS -l mppnppn=24 <- No. of processes per node <required: max. 24>
#PBS -l mppdepth=1 <- Should be 1 for MPI programs
#PBS -N BATCH-JOB <- Arbitary job name
cd $PBS_O_WORKDIR <- To move to the working directory
aprun -n 48 -N 24 -d 1 ./a.out
XC30 uses the aprun command to execute programs.
aprun [-n num/-N num/ -d num ] <ELF>
-n: no. of MPI processes (according to the mppwidth of the PBS)
-N: no. of MPI processes per node (according to the mppnppn of the PBS)
-d: no. of threads per MPI process (according to the mppdepth of the PBS)
MPI/OpenMP hybrid
#!/bin/csh
#PBS -l mppwidth=4 <- number of MPI processes to run. 2 process x 2 node = 4, in this case two node utilization
#PBS -l mppnppn=2 <- two processes within a node. 12 threads x 2 process = 24, using all 24 cores in the node.
#PBS -l mppdepth=12 <- 12 threads using 12 cores in the CPU
$PBS_O_WORKDIR <- to move to the working directory
aprun -n 4 -N 2 -d 12 ./a.out
Running multuple scripts
In XC30, the number of jobs that one user can have simultaneously has been limited since 2013/7. To execute more , you need to list multiple executable files in the job script.
An example is as follows.
* The execution results of some programs may be overwritten depending on how you write the script. Be careful about the storage location of the results.
#!/bin/csh
#PBS -q SMALL
#PBS -j oe
#PBS -l mppwidth=48 <- no. of MPI processes <required>
#PBS -l mppnppn=24 <- no. of processes per node <required:max. 24>
#PBS -l mppdepth=1 <- 1 for MPI programs
#PBS -N BATCH-JOB <- Job arbitary name
cd /work/xxxxxxxx/job1
aprun -n 48 -N 24 -d 1 ./job1.exe
cd /work/xxxxxxxxx/job2
aprun -n 48 -N 24 -d 1 ./job2.exe
cd /work/xxxxxxxx/job3
aprun -n 48 -N 24 -d 1 ./job3.exe
Application queue
[] If you can apply in advance and obtain consent, you can occupy the whole or some nodes for a limited period.
A period-limited queue is called an application queue.
Requesting an application queue
[]
- Apply at least 2 week ahead for LONG-L queue by contacting mpc-admin。
- mpc-admin group will inform the applicant with the schedule to use the LONG-L queue and stop submitting jobs according to that schedule.
- The applicant submits to the LONG-L queue at the specified schedule when the resources are available.
- LONG-L queue will finish and the results should be reported.