This course is a continuation An Introduction to Research Computing at York. If you are new to Linux command line we recommend you go through the steps of An Introduction to Research Computing at York before moving to this.

Viking Account Creation

Before you can access Viking you will to create a Viking account. The process can take a couple of days please complete the following steps

Click here to see the steps

Before logging into Viking please ensure your project supervisor fills in this form to request a project code.
The user then needs to fill in this form to request an account once they have a project code.
Accounts should take no longer than 24 hours to be created. You will receive an email on creation of your account.

See /wiki/spaces/RCS/pages/39158623 for more information

What is the Viking cluster and why should I use it?

If you are finding that your code is still taking a long time to finish or you wish to scale your work, the Viking cluster may be what you need.

Viking is a large Linux compute cluster aimed at users who require a platform for development and the execution of small or large compute jobs. Viking is a multidisciplinary facility, supporting a broad spectrum of research needs, free at the point of use to all University of York researchers. Viking is as much a facility for learning and exploring possibilities as it is a facility for running well-established high-performance computing workloads. In this light, we encourage users from all Faculties, backgrounds and levels of ability to consider Viking when thinking about how computing might support their research.

What is a cluster?

A cluster consists of many (hundreds or thousands) rack mounted computers called nodes. It would be similar to having hundreds of desktop computers sitting in the same room and able to talk to each other. Clusters are often accessed via login nodes, which can send jobs to the other nodes in the cluster. Your commands will not be run immediately, but will be sent to a queue, and run when there is space on the cluster.

The Viking cluster is Linux based and can be accessed in a similar manner to the research servers, but instead of accessing, say, research0.york.ac.uk, you would access viking.york.ac.uk.

Login into Viking

The process of logging into Viking is similar to the research servers as described in An Introduction to Research Computing at York. Viking has a Linux Operating system therefore how you access Viking will depend on what your operating system is on your local machine.

We will break down the different options here.

Before You Login

If you have not changed your IT Services password since August 2013 then you must do so before you will be able to login. All user password changes are manage via the My IT Account web page. Click on the Password Management (IDM) link in the Manage Your Password field to change your password. You may be given the option to 'synchronise' your password; please use this option if you do not want to change your password. The password change (or synchronisation) may take a few minutes before it is visible to the servers.

Accessing Viking off campus

To access the Viking off campus you can either use the Virtual Private Network - VPN or the SSH gateway service (registration required). The instructions below should work if you log on through the VPN; the SSH service works slightly differently.

Windows

Access from a Windows desktop

Command-line access using PuTTY

PuTTY is available on all IT Services Managed Windows systems. It is pre-installed on Classroom PCs; on Office PCs you can install it from Run Advertised Programs / Software Center. It appears under "Internet Tools" on the start menu.

On unmanaged PCs you can download the installer from the PuTTY Website.

Configuring PuTTY to connect to Viking

Open PuTTY and configure it to connect to Viking:

Add the name "viking.york.ac.uk" to the 'Host Name' field
Check the 'Connection Type' to SSH
Type the name "Viking" in 'Saved Sessions'
Click 'Save'

Expand the 'SSH' tab from the 'Category' list
Choose 'X11' from 'SSH' list
Check 'Enable X11 Forwarding'

Connecting to Viking

Start PuTTY
Select 'Viking' from the 'Saved Sessions'
Click 'Open'

A terminal window should appear. Log in with your university username and password. The first time you connect you will get a security alert showing the fingerprint of the server, labeled as 'ssh-rsa' or 'ssh-ed25519'.

If you are successful this is what you will see:

Whilst it is possible to configure X11 forwarding through PuTTY, X11 forwarding will only work on the Viking login nodes, which means that you won't be able to run graphical applications e.g. MATLAB on the Viking compute nodes using X11 forwarding. Details about virtual desktop sessions on Viking are provided in the Virtual Desktop section of this document.

MacOSX

Access from a Mac

Go to the Finder on your Mac, find Applications, open the Utilities folder in Applications and then start the Terminal app from the Utilities folder. (You may wish to add Terminal to your Dock.) Then type the following, using your university username (abc123). You do not need to type the $; this is an example of a prompt, which tells us the terminal is ready for us to type something. You should see something similar when you open Terminal, though it may be a bit longer (it may show your username for example). Just type anything from after the $.

[bash-4.1]$ ssh -X abc123@viking.york.ac.uk

Linux

Access from a UNIX server or desktop

To log in from a terminal emulator, use the following command:

[bash-4.1]$ ssh abc123@viking.york.ac.uk

where abc123 is your IT Services username. You will be prompted for your IT Services password. Please note, X11 forwarding will only work on the Viking login nodes, which means that you won't be able to run graphical applications e.g. MATLAB on the Viking compute nodes using X11 forwarding. Details about virtual desktop sessions on Viking are provided in the Virtual Desktop section of this document.

If you require X forwarding, type:

[bash-4.1]$ ssh -X abc123@viking.york.ac.uk

See /wiki/spaces/RCS/pages/39158979 for more information

Navigating Viking

When you log in, you will be directed to one of several login nodes. These allow Linux command line access to the system, which is necessary for the editing programs, compiling and running the code. Usage of the login nodes is shared amongst all who are logged in. These systems should not be used for running your code, other than for development and very short test runs.

Access to the job/batch submission system is through the login nodes. When a submitted job executes, processors on the compute nodes are exclusively made available for the purposes of running the job.

For this exercise lets look at Viking and what is available and setup an area to run our jobs on Viking.

Exercise 1

List the directories within your home area on Viking. What do you see

Example of command execution

-bash-4.1$ ls
bin  Chemistry  Desktop  examples  Experiments  intel  jobs  logs  tmp
-bash-4.1$ ls -l
total 296
drwxr-xr-x 2 abs4 csrv        4096 Jun 24 09:39 bin
drwxr-xr-x 3 abs4 csrv        4096 Jun  6 09:23 Chemistry
drwxr-sr-x 2 abs4 elecclust   4096 Mar 11 10:53 Desktop
drwxr-xr-x 3 abs4 csrv        4096 Jun 30 12:21 examples
drwxr-xr-x 5 abs4 csrv        4096 May 23 11:34 Experiments
drwxr-xr-x 3 abs4 csrv        4096 Aug 14 12:26 intel
drwxr-sr-x 3 abs4 elecclust   4096 Aug 15 12:49 jobs
drwxr-xr-x 2 abs4 csrv      266240 Aug 15 13:48 logs
drwxr-xr-x 3 abs4 csrv        4096 Aug 14 14:50 tmp
-bash-4.1$

Your home directory is backed up, but there are strict file and size limits in this area.

To see what space you have available run the myquota command

Display disc quota

[abc123@login1(viking) ~]$ /opt/site/york/bin/myquota 
Scratch quota:
Disk quotas for usr abc123 (uid 10506):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre  268.6G      3T    3.1T       - 2157882       0       0       -

Home quota:
Disk quotas for user abc123 (uid 10506): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
storage:/export/users
                2155420  52428800 78643200            1654  100000  150000

Here two are are displayed. Your home area and your scratch area.

Home Directory

This is the directory you start off in after logging on to one of the login nodes. It is NOT usable for large data-sets or frequent access files and no should jobs be run from here.

Lustre Fast Filesystem

This is the scratch directory in your home directory, and potentially any group data stores that you have access to. It is available to the compute nodes, and is the recommended place for your jobs to read and store data.

Here is a breakdown on the different areas and what they provide.

Home	scratch
50GB space limit	3TB initial quota more on request
Backed up	Not backed up
Slower performance	Lustre High performance filesystem.

Any data you need to keep long term should be moved to a backed up storage area. Further details on storage options can be found here.

Let us now navigate to the scratch area and create a new directory to run workloads from.

Locating your home directory and files

[abc123@login1(viking) ~]$ pwd
/home/userfs/e/ejb573
[abc123@login1(viking) ~]$ cd scratch
[abc123@login1(viking) ~]$ pwd
/mnt/lustre/users/abc123/scratch
[abc123@login1(viking) ~]$ mkdir hpcintro
[abc123@login1(viking) ~]$ cd test
[abc123@login1(viking) ~]$ pwd
/mnt/lustre/users/abc123/scratch/hpcintro

We now have a new directory in scratch

Loading software

We have an entire suite of software available on Viking. We use Easybuild to install the software and a module system to help you navigate what is available.

Here we will outline the main commands to load software on Viking

To view all available software run

[abc123@login1(viking) ~]$ module avail

If you want to see the versions of a particular software run:

[abc123@login1(viking) ~]$ module avail Kraken2

The module spider string command allows you to search for modules matching the string.

[abc123@login1(viking) ~]$ module spider Kraken2

To load a module type module load <module name>. You can append a version number to load a specific instance of a package. Note the <tab> key can be used to complete the command.

[abc123@login1(viking) ~]$ module load lang/Bison <tab> <tab>
lang/Bison Bison/3.0.4-GCCcore-8.1.0
lang/Bison/3.0.4
[abc123@login1(viking) ~]$ module load lang/Bison/3.0.4-GCCcore-8.1.0

To list currently loaded modules type module list

[abc123@login1(viking) ~]$ module list

Run module purge to unload all loaded modules.

[abc123@login1(viking) ~]$ module purge

To unload a specific module use

[abc123@login1(viking) ~]$ module unload

If you need to have a program installed on Viking please email itsupport@york.ac.uk with the details.

See /wiki/spaces/RCS/pages/39159178 for more information.

Running your workloads on Viking

Viking uses a queuing system called Slurm to ensure that your jobs are fairly scheduled to run on Viking.

What is Slurm and what is a scheduler?

Slurm is a job scheduling system for small and large clusters. As a cluster workload manager, Slurm has three key functions.

Lets a user request a resources on a compute node to run their workloads
Provides a framework (commands) to start, cancel, and monitor a job
Keeps track of all jobs to ensure everyone can efficiently use all computing resources without stepping on each others toes.

When a user submits a job Slurm will decide when to allow the job to run on a compute node. This is very important for shared machines such as the Viking cluster so that the resources are shared as fairly between users so one persons jobs does not dominate.

Resource allocation

In order to interact with the job/batch system (SLURM), the user must first give some indication of the resources they require. At a minimum these include:

how long does the job need to run for
on how many processors to run the job

The default resource allocation for jobs can be found /wiki/spaces/RCS/pages/39159441.

Armed with this information, the scheduler is able to dispatch the jobs at some point in the future when the resources become available. A fair-share policy is in operation to guide the scheduler towards allocating resources fairly between users.

Slurm commands

Before submitting a job to Viking it is useful to be introduced to useful Slurm commands.

Slurm Command Summary

The following are common commands that can be used on Viking. We will be using some of these commands in the examples going forward.

Command	Description
squeue	reports the state of jobs (it has a variety of filtering, sorting, and formatting options), by default, reports the running jobs in priority order followed by the pending jobs in priority order
srun	used to submit a job for execution in real time
salloc	allocate resources for a job in real time (typically used to allocate resources and spawn a shell, in which the srun command is used to launch parallel tasks)
sbatch	submit a job script for later execution (the script typically contains one or more srun commands to launch parallel tasks)
sattach	attach standard input, output, and error to a currently running job , or job step
scancel	cancel a pending or running job
sinfo	reports the state of partitions and nodes managed by Slurm (it has a variety of filtering, sorting, and formatting options)
sacct	report job accounting information about active or completed jobs

Running Slurm commands on Viking

Exercise 3

Login to Viking. Run the following command. What do you see?

[abc123@login1(viking) ~]$ squeue

squeue

You should see a list of jobs. Each column describes the status of each job

Column	Description
JOBID	A number used to uniquely identify your job within SLURM
PARTITION	The partition the job has been submitted to
NAME	The job's name
USER	The username of the job owner
ST	Current job status: R (running), PD (pending - queued and waiting)
TIME	The time the job has been running
NODES	The number of nodes used by the job
NODELIST (REASON)	The nodes used by the job or reason the job is not running

When you start to run your jobs

[abc123@login1(viking) ~]$ squeue -u abc123

or

[abc123@login1(viking) ~]$ squeue -j JOBID

Will provide information on the jobs you have queued or are running.

Submitting jobs to the partitions on Viking.

There are two ways to run your programs on the cluster

Batch Jobs

These are non-interactive sessions where a number of tasks are batched together into a job script, which is then scheduled and executed by Slurm when resources are available.

you write a job script containing the commands you want to execute on the cluster
you request an allocation of resources (nodes, cpus, memory)
the system grants you one, or more, compute nodes to execute your commands
your job script is automatically run
your script terminates and the system releases the resources

Interactive Sessions

These are similar to a normal remote login session, and are ideal for debugging and development, or for running interactive programs. The length of these sessions is limited compared to batch jobs however, so once your development is done, you should pack it up into a batch job and run it detached.

you request an allocation of resources (cpus, memory)
the system grants you a whole, or part, node to execute your commands
you are logged into the node
you run your commands interactively
you exit and the system automatically releases the resources

Exercise 2

In this exercise we will begin to run simple jobs on Viking.

The most basic type of job you can run is an interactive job on the command line. Type the following into the terminal

[abc123@login1(viking) ~]$ srun --ntasks=1 --time=00:30:00 --pty /bin/bash

What do you see?

[abc123@login1(viking) ~]$ srun --ntasks=1 --time=00:30:00 --pty /bin/bash
srun: job 6485884 queued and waiting for resources
srun: job 6485884 has been allocated resources
[abc123@node069 [viking] ~]$ pwd
/users/abc123
[abc123@node069 [viking] ~]$ echo "hello" exit
hello
[abc123@node069 [viking] ~]$ exit
exit
[abc123@login1(viking) ~]$

[abc123@login1(viking) ~]$ srun --ntasks=4 --time=00:30:00 --pty /bin/bash
srun: job 6485885 queued and waiting for resources
srun: job 6485885 has been allocated resources
[abc123@node071 [viking] ~]$

You may find you have to wait before your job is running.

The above example creates a single task (one core) interactive session in the default partition "nodes", with a 30 minute time limit. The second example creates a four task (four cores) session.

To terminate the session, exit the shell by typing "exit" and pressing enter.

When there is resource available you will be placed directly onto a compute node, in the above cases node069 and node071. Here you can run your workload as normal.

These sorts of jobs can be useful but you will have to run each one separately with manual intervention. If you have many jobs to run the best method would be to use a slurm job script.

Job scripts

Here will show you have to submit and check a simple slurm job

To submit a job to the queue you will need to create a submit script.

Example job script

#!/bin/bash
#SBATCH --job-name=simple                # Job name
#SBATCH --mail-type=BEGIN,END,FAIL       # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=abc123@york.ac.uk    # Where to send mail  
#SBATCH --ntasks=1                       # Run on a single CPU
#SBATCH --mem=1gb                        # Job memory request
#SBATCH --time=00:01:00                  # Time limit hrs:min:sec
#SBATCH --output=basic_job_%j.log        # Standard output and error log
#SBATCH --partition=nodes				 # Job queue
#SBATCH --account=PROJECTCODE            # Project account
 

module purge							 # purge any loaded modules
module load lang/Python/3.7.0-foss-2018b # Load a module within a job script

echo My working directory is `pwd`
echo Running job on host:
echo -e '\t'`hostname` at `date`
echo
echo
echo Job completed at `date`

To Submit a job run

[abc123@login1(viking) ~]$ sbatch simple.job

To see your job in the queue run the squeue commnd. The job running below has a jobid of 147875

[abc123@login1(viking) scratch]$ squeue -u abc123
JOBID  PARTITION NAME     USER ST TIME NODES NODELIST(REASON)
147875 nodes     simple.j abc123 R  0:04 1     node170

To delete a job from the queue use the scancel [options] <jobid> command, where jobid is a number referring to the specified job (available from squeue).

[abc123@login1(viking) scratch]$ scancel 147876

A user can delete all their jobs from the batch queues with the -u option:

$ scancel -u=<userid>

To look at your job history run sacct -j jobid

[abc123@login1(viking) scratch]$ sacct -j 147876
JobID        JobName    Partition  Account    AllocCPUS  State      ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
147876       simple.job nodes      dept-proj+ 1          CANCELLED+ 0:0
147876.batch batch                 dept-proj+ 1          CANCELLED  0:15

See /wiki/spaces/RCS/pages/39159024 and /wiki/spaces/RCS/pages/39167429 for more information.

Copying your data to and from Viking

You may have data stored elsewhere that you wish to copy to Viking. There are different ways to do this

Here will will outline steps on how to move you data to and from Viking

There are different ways to copy your data dependent on which operating system you use on your local machine.

Copying Files To/From a Windows Desktop

WinSCP is an open source free SFTP client, SCP client, FTPS client and FTP client for Windows. Its main function is file transfer between a local and a remote computer. WinSCP is available on IT Services supported desktops or can be downloaded from http://winscp.net/eng/index.php.

Run WinSCP from the Start menu or double clicking on the icon on the desktop;

A login window will appear. Fill in the hostname and your username:

Running WinSCP

You can click the "Save" button to save the session details for future use:

Return to the login window and click the "Login" button. Some hosts may present you with an information window:

You will then be prompted for your password:

The file manager window will be displayed:

The drag-and-drop interface is a similar to Windows file manager and its use should be intuitive.

Copying Files To/From a Linux/MacOS Desktop

Copying Files To/From a Linux/MacOS desktop

There are a number of ways to copying files and directories using Linux command line.

you can copy your data from any Linux device to Viking using the following commands

scp
rsync

Here are a couple of examples.

scp

This is recommended for a small number of files.

You wish to copy your data from local machine to your scratch area on Viking. Run the following commands on your local machine in the terminal.

[bash-4.1]$ #For an individual file
[bash-4.1]$ scp afile abc123@viking.york.ac.uk:~/scratch 

[bash-4.1]$ #For a folder with lots of files
[bash-4.1]$ scp -r adir abc123@viking.york.ac.uk:~/scratch

What if you want to copy files from your scratch area on Viking to your local machine? Run the following commands on your local machine.

[bash-4.1]$ #For an individual file
[bash-4.1]$ scp abc123@viking.york.ac.uk:~/scratch/afile . 

[bash-4.1]$ #For a folder with lots of files
[bash-4.1]$ scp -r abc123@viking.york.ac.uk:~/scratch/adir .

There are many options you can use with scp. To view these options either run

man scp

on the device you are using scp on or have a look at this scp wiki page.

RSYNC

Rsync is another command that will let you copy files and folders to the Viking. If you have a large number of files it is always best to use Rsync.

To copy a folder adir from your local machine to your scratch area on Viking, run the following command on your local machine.

[bash-4.1]$ rsync -avz adir abc123@viking.york.ac.uk:~/scratch

There are many options you can use with rsync. To view these options either run

man rsync

on the device you are using rsync on or consult the rsync webpage.

See VK7) Copying and moving your data to Viking. for more information.

Help and support

In the first instance check our wiki pages or email itsupport@york.ac.uk where one of our team will be in touch.

An Introduction to Viking