Page tree
Skip to end of metadata
Go to start of metadata

Helix is the code-name for Genomics England High Performance computer cluster that runs all production worthy workflows. Helix was released in 2020, replacing its predecessor Pegasus. Helix uses IBM's Load Sharing Facility (simply call spectrum LSF) as the workload management tool (Job Scheduler). 

For more information on how HPC's work, see here.  


(warning) External links on this page can only be accessed from outside the RE (warning)

Memory management is now being enforced on the HPC.This is being implemented to safeguard against rogue jobs consuming all the available memory on a node and crashing that node for all users.

When a job is submitted, you will need to specify -R rusage[mem=<memory_in_mb>] and -M <memory_in_mb>. -R informs LSF of how much memory you wish to consume, and -M will terminate your job if you consume more memory than asked for.

Each CPU has an allocation of 16GB (16000 Mb) of memory, so in order to use larger amounts, you will need to ask for additional CPUs with -n. For example, to ask for 32GB of RAM, the following flags should be used: -R rusage[mem=32000] -M 32000 -n 2

If -R and -M re not specified, default values of -R rusage[mem=1000] and -M 2000 will be added to your job.

For more information, please see the following documents for help on job submissions with -R:

Accessing the HPC

The HPC is acessed via ssh from the terminal. The following is an example of a gecip user, John Doe, connecting via ssh. The address will change depending on what group you belong to. The general format is [email protected]@phpgridzlogn00N.int.corp.gel.ac, where username will be your Genomics England username and N will depend on your group. See the table below for more information.

ssh [email protected]@phpgridzlogn001.int.corp.gel.ac

You will then be prompted for your password, and once entered, will be connected to the HPC.


If you do not want to enter a password each time you connect, you can create a ssh key and ssh config file that will make logging in easier.

Create a ssh key in your .ssh folder, which is located in /home/<username>/.ssh

cd .ssh
ssh-keygen

Follow the prompts to name your ssh key (I suggest cluster as a good name) and leave the password blank.


Next, create a ssh config file called config in the .ssh folder, 

nano config

the above will open the file. Add the following information and format:

Host cluster
	Hostname phpgridzlogn00N.int.corp.gel.ac
	User <your username>@corp.gel.ac
	IdentityFile ~/.ssh/cluster


Copy your new ssh public key to the HPC

ssh-copy-id -i cluster.pub cluster

This will ask for your password, then copy the ssh key to the HPC.


Now, instead of having to type

ssh [email protected]@phpgridzlogn001.int.corp.gel.ac

You can connect by typing

ssh cluster

Login nodes access address

NameWhoLDAP group
phpgridzlogn001.int.corp.gel.acGeCips & Researchersgecip_lsf_access, research_lsf_access
phpgridzlogn002.int.corp.gel.acGeCips & Researchersgecip_lsf_access, research_lsf_access
phpgridzlogn004.int.corp.gel.acCommercial (Discovery Forum)discovery_lsf_access
phpgridzlogn003.int.corp.gel.acInternal users

Note that internal users will need P2 VPN. (Details on the confluence page /Platform engineering/environment/VPN/P2 Helix VPN access).




Using software on the HPC

Sofware on the Genomics England HPC is managed through the module system framework. A full list of the software available in the modules is available here: Software Available on the HPC.

Module system commands

List software available

module avail

Loading software

module load lang/R/3.5.1-foss-2018b

Always specify the version of the software that you want to load, to avoid errors and unexpected results.
For instance, Running module load lang/R will load version 3.6.2 instead of my desired version of 3.5.1.

Unloading software

module unload lang/R/3.5.1-foss-2018b

Switching versions (required software to be loaded first)

module switch lang/R/3.6.2-foss-2019b

Common modules

Note that the module organisation has changed in Helix. So check how to load modules on Helix using module avail.

module unload lang/R/3.5.1-foss-2018b

module load lang/Python/3.7.2-GCCcore-8.2.0

module load bio/BCFtools/1.9-foss-2018b

Python Package and Environment Management

The conda package and environment management system is provided in keeping with current best practice for managing Python packages. This is provided alongside the EasyBuild-managed lmod modules environment following best practice for managing HPC software.

There is overlap in functionality which increases diversity and robustness, and supports debugging.

The conda environments are additionally built around Intel Distribution for Python, where possible, and performance benefits are expected. https://software.intel.com/en-us/distribution-for-python/benchmarks

To begin using these environments, make sure that on a fresh login, conda is not already in your path (using the command "which conda"). If you find conda is already in your path, you need to undo this with "conda init --reverse" and log in again.

Once the above has been checked, issue the command

. /resources/conda/miniconda3/etc/profile.d/conda.sh

taking care to include the dot and space at the front to source the file for bash (a similar file for csh in the same location). This method of accessing conda will be compatible with job scripts. "conda init" should be avoided unless you only use a single Python environment that can be safely loaded in ~/.bashrc.

The command "conda env list" will show the available environments. "idp" refers to the full Intel Distribution for Python, and "idpcore" refers to the core packages. https://software.intel.com/en-us/articles/complete-list-of-packages-for-the-intel-distribution-for-python

py2 and py3 refer to Python 2 and 3, respectively.

Environments with the prefix "test" are used to prepare the production environments. These test environments will additionally be used for testing and may change at any time so are unsuitable for production usage.

Issue "conda activate idppy3" to activate the desired environment, "idppy3" in this case. "conda env list" can be used to check the package versions available in your activated environment. "conda deactivate" will deactivate your environment after finishing your computation.

A suffix "revN" may appear on some environments, where N is potentially a multi-digit number, to denote revisions.

N.B. Contrary to standalone usage of conda, you should not issue the command "conda init" or else it will break integration with the module environment by making changes out of the control of the module command. If you accidentally call "conda init", reverse this with "conda init --reverse".

Container Support

Helix supports Singularity (https://sylabs.io/docs/) for containerised workflows. If your workflow is written in Docker, this is also ok, as Singularity is able to read Dockerfiles and convert them into Singularity images. Note that the usage of Singularity/Docker is optional. Softwares and tools can be loaded using module as before.





How to submit jobs to LSF

Please use the login node as a portal to the HPC to submit jobs and nothing else. Unauthorised tools will not be permitted to run on the login nodes, and if they are found to be running, will be terminated without warning.


Memory management is now being enforced on the HPC.This is being implemented to safeguard against rogue jobs consuming all the available memory on a node and crashing that node for all users.

When a job is submitted, you will need to specify -R rusage[mem=<memory_in_mb>] and -M <memory_in_mb>. -R informs LSF of how much memory you wish to consume, and -M will terminate your job if you consume more memory than asked for.

Each CPU has an allocation of 16GB (16000 Mb) of memory, so in order to use larger amounts, you will need to ask for additional CPUs with -n. For example, to ask for 32GB of RAM, the following flags should be used: -R rusage[mem=32000] -M 32000 -n 2

If -R and -M re not specified, default values of -R rusage[mem=1000] and -M 2000 will be added to your job.


To Submit an LSF job, you'll use the command bsub:

bsub -q <queue_name> -P <project_code> -o <output.stdout> -e <output.stderr> -R rusage[mem=<memory_in_mb>] -M <max_memory_in_mb> <myjob>


To submit an LSF job using a script, use the following command:

bsub -q <queue_name> -P <project_code> -o <output.stdout> -e <output.stderr> -R rusage[mem=<memory_in_mb>] -M <max_memory_in_mb> < <myscript.sh>

Note the < sign above indicating that your script is fed into the submission command.

Files output.stdout and output.stderr are file names of your choice where any messages, warnings and errors will be logged.


For a list of all LSF queues and project codes see LSF Project Codes. You can also see what queues are assigned to you using:

bugroup -w | grep <your_username> | awk '{print $1}'

You will only be able to submit to queues that you have LDAP access to.

For more on how to submit jobs, see Advanced Job Submission Guidelines.

Queues Available

Genomics England HPC queues are time based queues (short, medium, long). This means you need to submit the jobs to the queue that reflects the runtime of your job.

For example, if you have a job that will run anytime upto 4 hours, then short queue is the queue to submit the jobs to. Likewise, if you have a job that runs upto 24 hours, then medium queue is the right queue.

To see all available queues in the grid, run bqueues

QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP 
inter            50  Open:Active       -    -    -    -     0     0     0     0
short            30  Open:Active       -    -    -    -     0     0     0     0
medium           20  Open:Active       -    -    -    -     0     0     0     0
long             10  Open:Active       -    -    -    -     0     0     0     0


Queue nameWhoLDAP groupDescription
interALLN/AThis queue is for light weight interactive or GUI tools.The queue has a per user concurrent job limit of 5
shortexternal discovery_lsf_access, gecip_lsf_access, research_lsf_accessThis queue is for jobs with maximum RUNTIME of 4 hours
mediumexternaldiscovery_lsf_access, gecip_lsf_access, research_lsf_accessThis queue is for jobs with maximum RUNTIME of 24 hours
longexternaldiscovery_lsf_access, gecip_lsf_access, research_lsf_accessThis queue is for jobs with unlimited RUNTIME (default to 7 days if not specified - or specify the limit as -W [hours:]minutes )

Interactive Vs Batch Jobs

Interactive Jobs are jobs that you interact with

  • command line
  • GUI
  • Job stays connected to submission shell

Interactive jobs have dedicated queue (name of the queue is inter) with dedicated resources during core hours for faster dispatch

Batch jobs are jobs that you don’t interact with. Job is disconnected from submission shell.

Jobs are batch by default

Some Basic LSF Commands 

command

description

bsub

submits a job to the cluster

bqueues

shows info on the cluster queues

bjobs

shows info on the cluster jobs

bhosts

shows info on the cluster hosts

bhist

shows info on the finished cluster jobs

bacct

shows statistics and info on finished cluster jobs

bkill

removes a job from the cluster

lshosts

shows static resource info

lsload

shows dynamic resource info


bjobs is a very handy command to view job information (both Pending & running jobs). Using the long option ( -l ), it shows high level view of why (in case of job Pending in the queue), where, turnaround time, resource usage detail (for Running jobs)

Usage:

bjobs -l <JOBID>

Jobid is the job number generated when you submit you job. You can always get access the id of your running jobs doing s simple bjobs.





  • No labels