4) Cluster Computing
- 1 Submitting R Scripts to the Cluster
- 2 Array (Task) Jobs
- 2.1 Basic Use
- 2.1.1 createData R script
- 2.1.2 Array Job Script
- 2.1.3 Execution of createData array job
- 2.2 Array Indexing
- 2.2.1 Array indexing script
- 2.2.2 Array indexing execution
- 2.3 Dependent Arrays
- 2.3.1 Example Dependant Array Job
- 2.3.1.1 createData job
- 2.3.1.2 analyseData job
- 2.3.1.3 summaryData job
- 2.3.1.4 Dependant array example
- 2.3.1 Example Dependant Array Job
- 2.4 Selecting irregular file names in array jobs
- 2.1 Basic Use
Submitting R Scripts to the Cluster
Detailed instructions on managing cluster jobs can be found here
abs4@ecgberht$ cat simple.R
args <- commandArgs(trailingOnly = TRUE)
number=as.numeric(args[1])
string=args[2]
print(sprintf("R script called with arguments \'%s\' and \'%s\'", number, string))
abs4@ecgberht$ Rscript simple.R 96 "Hello World"
[1] "R script called with arguments '96' and 'Hello World'"
abs4@ecgberht$ more simple.job
#$ -cwd -V
#$ -l h_rt=0:15:00
#$ -o logs
#$ -e logs
#$ -N R_simple
echo `date`: executing R script simple on host ${HOSTNAME}
Rscript --no-save --no-restore simple.R 93 "The end of the world is not today" > simple.Rout
echo `date`: completed R script simple on host ${HOSTNAME}
abs4@ecgberht$ abs4@ecgberht$ qsub simple.job
Your job 711241 ("R_simple") has been submitted
abs4@ecgberht$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
711241 0.00000 R_simple abs4 qw 11/19/2014 12:46:35 1
abs4@ecgberht$ more simple.Rout
[1] "R script called with arguments '93' and 'The end of the world is not today'"
abs4@ecgberht$
Asking for more resources
By default the grid engine gives each job 1 core and 1GB of memory. This unit of resource is known as a slot. In order to use the parallel features discussed in section 2) Using multiple cores via the parallel package you need to ask for additional slots (cores) and/or RAM.
To request say 16 cores use the directive:
#$ -pe smp 16
To request 4GB of memory:
#$ -l h_vmem=4G
The memory request is per slot, so combining both requests will give you 16 cores and 64GB of memory. Requesting more cores and/or memory that is available on the node will result in the job waiting indefinitely in the queue. See here for more information.
Array (Task) Jobs
Basic Use
You can use array jobs to submit a large number of jobs — likely auto-generated — to do the same processing on a lot of individual input datasets or values. Examples include image processing, or rendering, with one job for each frame in a sequence. Parameter sweeps, with each job running the same calculation with a different parameter set from a pre-defined collection, can also be performed.
This diagram shows how the tasks are managed and executed. The first array job (red) has five tasks (green) executing - there may be more tasks waiting for execution.
Array jobs effectively constitute a parallel loop over the input data which can be manipulated as a unit by the Slurm queue management commands, sbatch, and scancel. The jobs have have a single entry in squeue output when waiting, but runnng an entry for each running member of the array is shown. This makes management easier than generating and manipulating a large number of individual jobs.
Each instance of the job in the effective loop is called a task, which has an index exported to the task’s environment for use in the job script. The tasks may be parallel.
The following code shows a job description for an array job running 25 copies of the R script "createData.R". The parameter '-t 1:25' specifies the number of tasks to create. The script takes two parameters, the size of the file to be created and the filename. The variable ${SGE_TASK_ID} is used to create a unique filename. This variable is numeric and is assigned a sequentially incremented value based on when the task was created.
args <- commandArgs(trailingOnly = TRUE)
dSize=as.numeric(args[1])
dFile=args[2]
create.data.file <- function(size, filename)
{
data <- rnorm(size);
write(data, file = filename, append=FALSE);
}
create.data.file(dSize, dFile)
#$ -cwd -V
#$ -l h_rt=1:15:00
#$ -o logs
#$ -e logs
#$ -N R_createData
#$ -t 1-25
echo `date`: executing R create data module on host ${HOSTNAME}
Rscript createData.R 25000000 data/datafile.${SGE_TASK_ID}
echo `date`: completed R create data module on host ${HOSTNAME}
create.data.file(dSize, dFile)abs4@ecgberht$ qsub createData.job
Your job-array 711238.1-25:1 ("R_createData") has been submitted
abs4@ecgberht$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
711238 0.00000 R_createDa abs4 qw 11/19/2014 12:18:27 1 1-25:1
abs4@ecgberht$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-long@rnode2.york.ac.uk 1 1
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-long@rnode0.york.ac.uk 1 2
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-long@rnode4.york.ac.uk 1 3
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode17.york.ac.uk 1 4
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode16.york.ac.uk 1 5
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode9.york.ac.uk 1 6
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode14.york.ac.uk 1 7
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode15.york.ac.uk 1 8
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode20.york.ac.uk 1 9
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode18.york.ac.uk 1 10
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode19.york.ac.uk 1 11
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode7.york.ac.uk 1 12
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode21.york.ac.uk 1 13
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode11.york.ac.uk 1 14
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-gpu@rnode6.york.ac.uk 1 15
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode8.york.ac.uk 1 16
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-long@rnode1.york.ac.uk 1 17
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-long@rnode3.york.ac.uk 1 18
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode11.york.ac.uk 1 19
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-long@rnode0.york.ac.uk 1 20
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-gpu@rnode6.york.ac.uk 1 21
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode21.york.ac.uk 1 22
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode7.york.ac.uk 1 23
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode19.york.ac.uk 1 24
711238 0.55500 R_createDa abs4 r 11/19/2014 12:18:40 its-day@rnode18.york.ac.uk 1 25
abs4@ecgberht$ qstat
abs4@ecgberht$ ls -l data
total 6225728
-rw------- 1 abs4 csrv 253994863 Nov 19 12:20 datafile.1
-rw------- 1 abs4 csrv 254011505 Nov 19 12:20 datafile.10
-rw------- 1 abs4 csrv 254003260 Nov 19 12:20 datafile.11
-rw------- 1 abs4 csrv 253995252 Nov 19 12:20 datafile.12
-rw------- 1 abs4 csrv 254003817 Nov 19 12:20 datafile.13
-rw------- 1 abs4 csrv 254003548 Nov 19 12:20 datafile.14
-rw------- 1 abs4 csrv 253994947 Nov 19 12:20 datafile.15
-rw------- 1 abs4 csrv 253997381 Nov 19 12:20 datafile.16
-rw------- 1 abs4 csrv 254001350 Nov 19 12:20 datafile.17
-rw------- 1 abs4 csrv 254000042 Nov 19 12:20 datafile.18
-rw------- 1 abs4 csrv 253993076 Nov 19 12:20 datafile.19
-rw------- 1 abs4 csrv 253998375 Nov 19 12:20 datafile.2
-rw------- 1 abs4 csrv 253997415 Nov 19 12:20 datafile.20
-rw------- 1 abs4 csrv 254003162 Nov 19 12:20 datafile.21
-rw------- 1 abs4 csrv 254001136 Nov 19 12:20 datafile.22
-rw------- 1 abs4 csrv 253998621 Nov 19 12:20 datafile.23
-rw------- 1 abs4 csrv 254000813 Nov 19 12:20 datafile.24
-rw------- 1 abs4 csrv 253999557 Nov 19 12:20 datafile.25
-rw------- 1 abs4 csrv 254000352 Nov 19 12:20 datafile.3
-rw------- 1 abs4 csrv 254001415 Nov 19 12:20 datafile.4
-rw------- 1 abs4 csrv 254001743 Nov 19 12:20 datafile.5
-rw------- 1 abs4 csrv 253996849 Nov 19 12:20 datafile.6
-rw------- 1 abs4 csrv 253999499 Nov 19 12:20 datafile.7
-rw------- 1 abs4 csrv 254005345 Nov 19 12:20 datafile.8
-rw------- 1 abs4 csrv 253997331 Nov 19 12:20 datafile.9
abs4@ecgberht$ more logs/R_createData.o711238.17
Wed Nov 19 12:18:41 GMT 2014: executing R create data module on host rnode1
Wed Nov 19 12:20:09 GMT 2014: completed R create data module on host rnode1
abs4@ecgberht$
Array Indexing
The array stride can be a value other that 1. The following example executes four tasks, with id 1, 6, 11, 16
The variables SGE_TASK_FIRST, SGE_TASK_LAST, and SGE_TASK_STEPSIZE provide the task range and stride to the script.
Dependent Arrays
The command qsub -hold_jid_add is used to hold an array job's tasks dependant on corresponding tasks. In the following example array task n of task_job2 will only run after task n of task_job1 has completed. Task n of task_job3 will only run after tasks n and n+1 of task_job2 have finished.
qsub -t 1-5 Job1
qsub -t 1-3 -hold_jid_ad Job1 Job2
qsub -t 1-5 -hold_jid_ad Job2 Job3
Example Dependant Array Job
In the following example we are going to submit three jobs, "createData", "analyseData", and "summaryData". Each job is dependant on the previous job.
We now submit the three jobs Note how we use the job name specified with the "-N" command for the dependent jobs. If we did not use this we would have to use the filename of the job. The final job is a single task that collates all the dat files - we use the stepsize of 25 to force a single instance of the job. SGE requires that dependant jobs have the same number of tasks.
Note how tasks of the "R_analyseData" job can start when the corresponding "createData.job" has terminated.
Dependant array example
abs4@ecgberht$ qsub createData.job
Your job-array 711320.1-25:1 ("R_createData") has been submitted
abs4@ecgberht$ qsub -hold_jid_ad R_createData analyseData.job
Your job-array 711321.1-25:1 ("R_analyseData") has been submitted
abs4@ecgberht$ qsub -t 1-25:25 -hold_jid_ad R_analyseData summaryData.job
Your job-array 711322.1-25:25 ("R_summaryData") has been submitted
abs4@ecgberht$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
711320 0.00000 R_createDa abs4 qw 11/21/2014 10:26:54 1 1-25:1
711321 0.00000 R_analyseD abs4 hqw 11/21/2014 10:26:54 1 1-25:1
711322 0.00000 R_summaryD abs4 hqw 11/21/2014 10:26:54 1 1
abs4@ecgberht$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-long@rnode1.york.ac.uk 1 1
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode21.york.ac.uk 1 2
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-long@rnode4.york.ac.uk 1 3
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode9.york.ac.uk 1 4
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode14.york.ac.uk 1 5
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode16.york.ac.uk 1 6
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode17.york.ac.uk 1 7
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode19.york.ac.uk 1 8
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-long@rnode3.york.ac.uk 1 9
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode20.york.ac.uk 1 10
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode8.york.ac.uk 1 11
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-long@rnode2.york.ac.uk 1 12
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-gpu@rnode6.york.ac.uk 1 13
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode15.york.ac.uk 1 14
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode11.york.ac.uk 1 15
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode18.york.ac.uk 1 16
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-long@rnode5.york.ac.uk 1 17
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode7.york.ac.uk 1 18
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode11.york.ac.uk 1 19
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-gpu@rnode6.york.ac.uk 1 20
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode21.york.ac.uk 1 21
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode14.york.ac.uk 1 22
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode9.york.ac.uk 1 23
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode16.york.ac.uk 1 24
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode17.york.ac.uk 1 25
711321 0.00000 R_analyseD abs4 hqw 11/21/2014 10:26:54 1 1-25:1
711322 0.00000 R_summaryD abs4 hqw 11/21/2014 10:26:54 1
abs4@ecgberht$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode9.york.ac.uk 1 4
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-gpu@rnode6.york.ac.uk 1 13
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode11.york.ac.uk 1 15
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-gpu@rnode6.york.ac.uk 1 20
711320 0.50500 R_createDa abs4 r 11/21/2014 10:26:55 its-day@rnode17.york.ac.uk 1 25
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-day@rnode20.york.ac.uk 1 1
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-day@rnode18.york.ac.uk 1 3
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-day@rnode7.york.ac.uk 1 9
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-long@rnode1.york.ac.uk 1 10
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-long@rnode4.york.ac.uk 1 11
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-long@rnode3.york.ac.uk 1 12
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-day@rnode15.york.ac.uk 1 14
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-day@rnode8.york.ac.uk 1 16
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-long@rnode2.york.ac.uk 1 17
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-long@rnode5.york.ac.uk 1 18
711321 0.50500 R_analyseD abs4 r 11/21/2014 10:28:25 its-day@rnode11.york.ac.uk 1 21
711321 0.00000 R_analyseD abs4 qw 11/21/2014 10:26:54 1 2,5,6-8:1,19,22,23,24
711321 0.00000 R_analyseD abs4 hqw 11/21/2014 10:26:54 1 4,13,15,20,25
711322 0.00000 R_summaryD abs4 hqw 11/21/2014 10:26:54 1 1
abs4@ecgberht$
Selecting irregular file names in array jobs
Often your data files can not be referenced via a unique numerical index - the file names will be dates or unrelated strings. This example demonstrates processing a set of files with no numerical index. We first make a file containing a list of the filenames. A short "awk" script returns the line as indexed by the numerical argument.