This page will highlight some best practices to work with containers within the Research Environment. Please note that this page aimed at more advanced command line users and provide them with the necessary setup to run their containers. It will therefore not outline how to learn to work with containers. Some information displayed here may be useful for the learning process but we would suggest other resources online to learn more about working with containers.
This is a new feature that we are rolling out to Helix. If you have any feedback or suggestions on the usage of singularity and containers, please reach out to us via the Genomics England Service Desk.
It is now possible to run containerised software on the HPC Helix. While Docker is not available, Singularity is available and through that various docker containers can be pulled and run. Here we provide some best-practices on how to run it and how to set it up. For security reasons we cannot allow pushing out of the environment.
Loading Singularity on the HPC
To use Singularity on Helix please type the following:
module load singularity/3.2.1
Whenever you create an image with Singularity within the HPC, the files are automatically cached. The cached files are located in
/home/<username>/.singularity/. However, it could be that you are submitting and creating an image via a compute node in an interactive session. In that case the caching will output the file there which may potentially flood the compute node's memory. You can redirect this location by setting the environment variable
For example, we recommend placing the environment variable in your
.bashrc script as followed
To view your current cache you can use the command
singularity cache list and
singularity cache list --all to view all the individual blob's that have been pulled.
To clean up your cache you can use the command:
singularity cache clean
Running bcftools from containers (Example quay.io)
As an example on how to run the containers from Helix, we are showcasing the usage of bcftools 1.13 (This version is at present not available yet on Helix as an installed module). The repository on quay.io has various builds available and can be run seamlessly on Helix: https://quay.io/repository/biocontainers/bcftools?tab=info.
Below we first load
singularity, pull the container and build a singularity image so you do not need to pull the container every time. We then show the basic command, and an example where we mount the
/genomes/ folder and run a simple
bcftools view command.
Some containers may be sizeable, so we recommend pulling and/or creating images via an interactive session. The bcftools container of this example is ~234 Mb, but they can easily reach >Gb depending on the software complexity. Please also note the caching section above.
Mounting drives and environment variables
In the above example we use the
--bind argument to mount the /genomes folder to the container. By default containers will not have the same drives mounted to them, so this needs to be added manually. An added complication of our file system is that we generally make use of relative paths. For instance, the actual path of our
/genomes/ folder is
/nas/weka.gel.zone/pgen_genomes/. On a day to day basis you will not find any hindrance of this, however for containers it is something to be aware of. In fact, you will first need to
--bind the full path, and then add another
--bind for the relative path. As we can understand that this can be rather frustrating, we provide a list of useful file paths and relative paths for to ensure a path of least resistance.
Below shows an example where we are using two of these variables to save the header of a vcf into a .txt file. The example assumes that you also ran the initial bcftools example shown above. Please note that you should change the file path to your own folders, and check whether you need to use the Gecip or Discovery Forum example.
Working with containers within a workflow
Two ways of going about with this, either pull the container directly within a task of the workflow or create an image beforehand and let the workflow call upon the image. You can also add some of the
--bind examples from above into the
List of available repositories
There are various container repositories available which have been whitelisted for Helix. Below you will find the current list of available repositories: