This course is a continuation An Introduction to Research Computing at York. If you are new to Linux command line we recommend you go through the steps of An Introduction to Research Computing at York before moving to this.
Viking Account Creation
Before you can access Viking you will to create a Viking account. The process can take a couple of days please complete the following steps
Click here to see the steps
- Before logging into Viking please ensure your project supervisor fills in this form to request a project code.
- The user then needs to fill in this form to request an account once they have a project code.
- Accounts should take no longer than 24 hours to be created. You will receive an email on creation of your account.
See /wiki/spaces/RCS/pages/39158623 for more information
What is the Viking cluster and why should I use it?
If you are finding that your code is still taking a long time to finish or you wish to scale your work, the Viking cluster may be what you need.
Viking is a large Linux compute cluster aimed at users who require a platform for development and the execution of small or large compute jobs. Viking is a multidisciplinary facility, supporting a broad spectrum of research needs, free at the point of use to all University of York researchers. Viking is as much a facility for learning and exploring possibilities as it is a facility for running well-established high-performance computing workloads. In this light, we encourage users from all Faculties, backgrounds and levels of ability to consider Viking when thinking about how computing might support their research.
What is a cluster?
A cluster consists of many (hundreds or thousands) rack mounted computers called nodes. It would be similar to having hundreds of desktop computers sitting in the same room and able to talk to each other. Clusters are often accessed via login nodes, which can send jobs to the other nodes in the cluster. Your commands will not be run immediately, but will be sent to a queue, and run when there is space on the cluster.
The Viking cluster is Linux based and can be accessed in a similar manner to the research servers, but instead of accessing, say, research0.york.ac.uk, you would access viking.york.ac.uk.
Login into Viking
The process of logging into Viking is similar to the research servers as described in An Introduction to Research Computing at York. Viking has a Linux Operating system therefore how you access Viking will depend on what your operating system is on your local machine.
We will break down the different options here.
Before You Login
If you have not changed your IT Services password since August 2013 then you must do so before you will be able to login. All user password changes are manage via the My IT Account web page. Click on the Password Management (IDM) link in the Manage Your Password field to change your password. You may be given the option to 'synchronise' your password; please use this option if you do not want to change your password. The password change (or synchronisation) may take a few minutes before it is visible to the servers.
Accessing Viking off campus
To access the Viking off campus you can either use the Virtual Private Network - VPN or the SSH gateway service (registration required). The instructions below should work if you log on through the VPN; the SSH service works slightly differently.
Windows
Access from a Windows desktop
Command-line access using PuTTY
PuTTY is available on all IT Services Managed Windows systems. It is pre-installed on Classroom PCs; on Office PCs you can install it from Run Advertised Programs / Software Center. It appears under "Internet Tools" on the start menu.
On unmanaged PCs you can download the installer from the PuTTY Website.
Configuring PuTTY to connect to Viking
Open PuTTY and configure it to connect to Viking:
- Add the name "viking.york.ac.uk" to the 'Host Name' field
- Check the 'Connection Type' to SSH
- Type the name "Viking" in 'Saved Sessions'
- Click 'Save'
- Expand the 'SSH' tab from the 'Category' list
- Choose 'X11' from 'SSH' list
- Check 'Enable X11 Forwarding'
Connecting to Viking
- Start PuTTY
- Select 'Viking' from the 'Saved Sessions'
- Click 'Open'
A terminal window should appear. Log in with your university username and password. The first time you connect you will get a security alert showing the fingerprint of the server, labeled as 'ssh-rsa' or 'ssh-ed25519'.
If you are successful this is what you will see:
Whilst it is possible to configure X11 forwarding through PuTTY, X11 forwarding will only work on the Viking login nodes, which means that you won't be able to run graphical applications e.g. MATLAB on the Viking compute nodes using X11 forwarding. Details about virtual desktop sessions on Viking are provided in the Virtual Desktop section of this document.
MacOSX
Access from a Mac
Go to the Finder on your Mac, find Applications, open the Utilities folder in Applications and then start the Terminal app from the Utilities folder. (You may wish to add Terminal to your Dock.) Then type the following, using your university username (abc123). You do not need to type the $; this is an example of a prompt, which tells us the terminal is ready for us to type something. You should see something similar when you open Terminal, though it may be a bit longer (it may show your username for example). Just type anything from after the $.
[bash-4.1]$ ssh -X abc123@viking.york.ac.uk
Linux
Access from a UNIX server or desktop
To log in from a terminal emulator, use the following command:
[bash-4.1]$ ssh abc123@viking.york.ac.uk
where abc123
is your IT Services username. You will be prompted for your IT Services password. Please note, X11 forwarding will only work on the Viking login nodes, which means that you won't be able to run graphical applications e.g. MATLAB on the Viking compute nodes using X11 forwarding. Details about virtual desktop sessions on Viking are provided in the Virtual Desktop section of this document.
If you require X forwarding, type:
[bash-4.1]$ ssh -X abc123@viking.york.ac.uk
See /wiki/spaces/RCS/pages/39158979 for more information
Navigating Viking
When you log in, you will be directed to one of several login nodes. These allow Linux command line access to the system, which is necessary for the editing programs, compiling and running the code. Usage of the login nodes is shared amongst all who are logged in. These systems should not be used for running your code, other than for development and very short test runs.
Access to the job/batch submission system is through the login nodes. When a submitted job executes, processors on the compute nodes are exclusively made available for the purposes of running the job.
For this exercise lets look at Viking and what is available and setup an area to run our jobs on Viking.
Exercise 1
List the directories within your home area on Viking. What do you see
-bash-4.1$ ls
bin Chemistry Desktop examples Experiments intel jobs logs tmp
-bash-4.1$ ls -l
total 296
drwxr-xr-x 2 abs4 csrv 4096 Jun 24 09:39 bin
drwxr-xr-x 3 abs4 csrv 4096 Jun 6 09:23 Chemistry
drwxr-sr-x 2 abs4 elecclust 4096 Mar 11 10:53 Desktop
drwxr-xr-x 3 abs4 csrv 4096 Jun 30 12:21 examples
drwxr-xr-x 5 abs4 csrv 4096 May 23 11:34 Experiments
drwxr-xr-x 3 abs4 csrv 4096 Aug 14 12:26 intel
drwxr-sr-x 3 abs4 elecclust 4096 Aug 15 12:49 jobs
drwxr-xr-x 2 abs4 csrv 266240 Aug 15 13:48 logs
drwxr-xr-x 3 abs4 csrv 4096 Aug 14 14:50 tmp
-bash-4.1$
Your home directory is backed up, but there are strict file and size limits in this area.
To see what space you have available run the myquota command
[abc123@login1(viking) ~]$ /opt/site/york/bin/myquota
Scratch quota:
Disk quotas for usr abc123 (uid 10506):
Filesystem used quota limit grace files quota limit grace
/mnt/lustre 268.6G 3T 3.1T - 2157882 0 0 -
Home quota:
Disk quotas for user abc123 (uid 10506):
Filesystem blocks quota limit grace files quota limit grace
storage:/export/users
2155420 52428800 78643200 1654 100000 150000
Here two are are displayed. Your home area and your scratch area.
Home Directory
This is the directory you start off in after logging on to one of the login nodes. It is NOT usable for large data-sets or frequent access files and no should jobs be run from here.
Lustre Fast Filesystem
This is the scratch
directory in your home directory, and potentially any group data stores that you have access to. It is available to the compute nodes, and is the recommended place for your jobs to read and store data.
Here is a breakdown on the different areas and what they provide.
Home | scratch |
---|
50GB space limit | 3TB initial quota more on request |
Backed up | Not backed up |
Slower performance | Lustre High performance filesystem. |
Any data you need to keep long term should be moved to a backed up storage area. Further details on storage options can be found here.
Let us now navigate to the scratch area and create a new directory to run workloads from.
[abc123@login1(viking) ~]$ pwd
/home/userfs/e/ejb573
[abc123@login1(viking) ~]$ cd scratch
[abc123@login1(viking) ~]$ pwd
/mnt/lustre/users/abc123/scratch
[abc123@login1(viking) ~]$ mkdir hpcintro
[abc123@login1(viking) ~]$ cd test
[abc123@login1(viking) ~]$ pwd
/mnt/lustre/users/abc123/scratch/hpcintro
We now have a new directory in scratch
Loading software
We have an entire suite of software available on Viking. We use Easybuild to install the software and a module system to help you navigate what is available.
Here we will outline the main commands to load software on Viking
To view all available software run
[abc123@login1(viking) ~]$ module avail
If you want to see the versions of a particular software run:
[abc123@login1(viking) ~]$ module avail Kraken2
The module spider string command allows you to search for modules matching the string.
[abc123@login1(viking) ~]$ module spider Kraken2
To load a module type module load <module name>. You can append a version number to load a specific instance of a package. Note the <tab> key can be used to complete the command.
[abc123@login1(viking) ~]$ module load lang/Bison <tab> <tab>
lang/Bison Bison/3.0.4-GCCcore-8.1.0
lang/Bison/3.0.4
[abc123@login1(viking) ~]$ module load lang/Bison/3.0.4-GCCcore-8.1.0
To list currently loaded modules type module list
[abc123@login1(viking) ~]$ module list
Run module purge to unload all loaded modules.
[abc123@login1(viking) ~]$ module purge
To unload a specific module use
[abc123@login1(viking) ~]$ module unload
If you need to have a program installed on Viking please email itsupport@york.ac.uk with the details.
See /wiki/spaces/RCS/pages/39159178 for more information.
Running your workloads on Viking
Viking uses a queuing system called Slurm to ensure that your jobs are fairly scheduled to run on Viking.
What is Slurm and what is a scheduler?
Slurm is a job scheduling system for small and large clusters. As a cluster workload manager, Slurm has three key functions.
- Lets a user request a resources on a compute node to run their workloads
- Provides a framework (commands) to start, cancel, and monitor a job
- Keeps track of all jobs to ensure everyone can efficiently use all computing resources without stepping on each others toes.
When a user submits a job Slurm will decide when to allow the job to run on a compute node. This is very important for shared machines such as the Viking cluster so that the resources are shared as fairly between users so one persons jobs does not dominate.
Resource allocation
In order to interact with the job/batch system (SLURM), the user must first give some indication of the resources they require. At a minimum these include:
- how long does the job need to run for
- on how many processors to run the job
The default resource allocation for jobs can be found /wiki/spaces/RCS/pages/39159441.
Armed with this information, the scheduler is able to dispatch the jobs at some point in the future when the resources become available. A fair-share policy is in operation to guide the scheduler towards allocating resources fairly between users.
Slurm commands
Before submitting a job to Viking it is useful to be introduced to useful Slurm commands.
Slurm Command Summary
The following are common commands that can be used on Viking. We will be using some of these commands in the examples going forward.
Command | Description |
---|
squeue | reports the state of jobs (it has a variety of filtering, sorting, and formatting options), by default, reports the running jobs in priority order followed by the pending jobs in priority order |
srun | used to submit a job for execution in real time |
salloc | allocate resources for a job in real time (typically used to allocate resources and spawn a shell, in which the srun command is used to launch parallel tasks) |
sbatch | submit a job script for later execution (the script typically contains one or more srun commands to launch parallel tasks) |
sattach | attach standard input, output, and error to a currently running job , or job step |
scancel | cancel a pending or running job |
sinfo | reports the state of partitions and nodes managed by Slurm (it has a variety of filtering, sorting, and formatting options) |
sacct | report job accounting information about active or completed jobs |
Running Slurm commands on Viking
Exercise 3
Login to Viking. Run the following command. What do you see?
[abc123@login1(viking) ~]$ squeue
You should see a list of jobs. Each column describes the status of each job
Column | Description |
---|
JOBID | A number used to uniquely identify your job within SLURM |
PARTITION | The partition the job has been submitted to |
NAME | The job's name |
USER | The username of the job owner |
ST | Current job status: R (running), PD (pending - queued and waiting) |
TIME | The time the job has been running |
NODES | The number of nodes used by the job |
NODELIST (REASON)
| The nodes used by the job or reason the job is not running
|
When you start to run your jobs
[abc123@login1(viking) ~]$ squeue -u abc123
or
[abc123@login1(viking) ~]$ squeue -j JOBID
Will provide information on the jobs you have queued or are running.
Submitting jobs to the partitions on Viking.
There are two ways to run your programs on the cluster
Batch Jobs
These are non-interactive sessions where a number of tasks are batched together into a job script, which is then scheduled and executed by Slurm when resources are available.
- you write a job script containing the commands you want to execute on the cluster
- you request an allocation of resources (nodes, cpus, memory)
- the system grants you one, or more, compute nodes to execute your commands
- your job script is automatically run
- your script terminates and the system releases the resources
Interactive Sessions
These are similar to a normal remote login session, and are ideal for debugging and development, or for running interactive programs. The length of these sessions is limited compared to batch jobs however, so once your development is done, you should pack it up into a batch job and run it detached.
- you request an allocation of resources (cpus, memory)
- the system grants you a whole, or part, node to execute your commands
- you are logged into the node
- you run your commands interactively
- you exit and the system automatically releases the resources
Exercise 2
In this exercise we will begin to run simple jobs on Viking.
The most basic type of job you can run is an interactive job on the command line. Type the following into the terminal
[abc123@login1(viking) ~]$ srun --ntasks=1 --time=00:30:00 --pty /bin/bash
What do you see?
[abc123@login1(viking) ~]$ srun --ntasks=1 --time=00:30:00 --pty /bin/bash
srun: job 6485884 queued and waiting for resources
srun: job 6485884 has been allocated resources
[abc123@node069 [viking] ~]$ pwd
/users/abc123
[abc123@node069 [viking] ~]$ echo "hello" exit
hello
[abc123@node069 [viking] ~]$ exit
exit
[abc123@login1(viking) ~]$
[abc123@login1(viking) ~]$ srun --ntasks=4 --time=00:30:00 --pty /bin/bash
srun: job 6485885 queued and waiting for resources
srun: job 6485885 has been allocated resources
[abc123@node071 [viking] ~]$
You may find you have to wait before your job is running.
The above example creates a single task (one core) interactive session in the default partition "nodes", with a 30 minute time limit. The second example creates a four task (four cores) session.
To terminate the session, exit the shell by typing "exit" and pressing enter.
When there is resource available you will be placed directly onto a compute node, in the above cases node069 and node071. Here you can run your workload as normal.
These sorts of jobs can be useful but you will have to run each one separately with manual intervention. If you have many jobs to run the best method would be to use a slurm job script.
Job scripts
Here will show you have to submit and check a simple slurm job
To submit a job to the queue you will need to create a submit script.
Example job script
#!/bin/bash
#SBATCH --job-name=simple # Job name
#SBATCH --mail-type=BEGIN,END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=abc123@york.ac.uk # Where to send mail
#SBATCH --ntasks=1 # Run on a single CPU
#SBATCH --mem=1gb # Job memory request
#SBATCH --time=00:01:00 # Time limit hrs:min:sec
#SBATCH --output=basic_job_%j.log # Standard output and error log
#SBATCH --partition=nodes # Job queue
#SBATCH --account=PROJECTCODE # Project account
module purge # purge any loaded modules
module load lang/Python/3.7.0-foss-2018b # Load a module within a job script
echo My working directory is `pwd`
echo Running job on host:
echo -e '\t'`hostname` at `date`
echo
echo
echo Job completed at `date`
To Submit a job run
[abc123@login1(viking) ~]$ sbatch simple.job
To see your job in the queue run the squeue commnd. The job running below has a jobid of 147875
[abc123@login1(viking) scratch]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
147875 nodes simple.j abc123 R 0:04 1 node170
To delete a job from the queue use the scancel [options] <jobid> command, where jobid is a number referring to the specified job (available from squeue).
[abc123@login1(viking) scratch]$ scancel 147876
A user can delete all their jobs from the batch queues with the -u option:
$ scancel -u=<userid>
To look at your job history run sacct -j jobid
[abc123@login1(viking) scratch]$ sacct -j 147876
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
147876 simple.job nodes dept-proj+ 1 CANCELLED+ 0:0
147876.batch batch dept-proj+ 1 CANCELLED 0:15
See /wiki/spaces/RCS/pages/39159024 and /wiki/spaces/RCS/pages/39167429 for more information.
Copying your data to and from Viking
You may have data stored elsewhere that you wish to copy to Viking. There are different ways to do this
Here will will outline steps on how to move you data to and from Viking
There are different ways to copy your data dependent on which operating system you use on your local machine.
Copying Files To/From a Windows Desktop
Copying Files To/From a Windows Desktop
WinSCP is an open source free SFTP client, SCP client, FTPS client and FTP client for Windows. Its main function is file transfer between a local and a remote computer. WinSCP is available on IT Services supported desktops or can be downloaded from http://winscp.net/eng/index.php.
Run WinSCP from the Start menu or double clicking on the icon on the desktop;
A login window will appear. Fill in the hostname and your username:
Running WinSCP
You can click the "Save" button to save the session details for future use:
Return to the login window and click the "Login" button. Some hosts may present you with an information window:
You will then be prompted for your password:
The file manager window will be displayed:
The drag-and-drop interface is a similar to Windows file manager and its use should be intuitive.
Copying Files To/From a Linux/MacOS Desktop
Copying Files To/From a Linux/MacOS desktop
There are a number of ways to copying files and directories using Linux command line.
you can copy your data from any Linux device to Viking using the following commands
Here are a couple of examples.
scp
This is recommended for a small number of files.
You wish to copy your data from local machine to your scratch area on Viking. Run the following commands on your local machine in the terminal.
[bash-4.1]$ #For an individual file
[bash-4.1]$ scp afile abc123@viking.york.ac.uk:~/scratch
[bash-4.1]$ #For a folder with lots of files
[bash-4.1]$ scp -r adir abc123@viking.york.ac.uk:~/scratch
What if you want to copy files from your scratch area on Viking to your local machine? Run the following commands on your local machine.
[bash-4.1]$ #For an individual file
[bash-4.1]$ scp abc123@viking.york.ac.uk:~/scratch/afile .
[bash-4.1]$ #For a folder with lots of files
[bash-4.1]$ scp -r abc123@viking.york.ac.uk:~/scratch/adir .
There are many options you can use with scp. To view these options either run
on the device you are using scp on or have a look at this scp wiki page.
RSYNC
Rsync is another command that will let you copy files and folders to the Viking. If you have a large number of files it is always best to use Rsync.
To copy a folder adir from your local machine to your scratch area on Viking, run the following command on your local machine.
[bash-4.1]$ rsync -avz adir abc123@viking.york.ac.uk:~/scratch
There are many options you can use with rsync. To view these options either run
on the device you are using rsync on or consult the rsync webpage.
See VK7) Copying and moving your data to Viking. for more information.
Help and support
In the first instance check our wiki pages or email itsupport@york.ac.uk where one of our team will be in touch.