Welcome to the Genomics England Research Environment User Guide.
In this documentation, we will provide you with the knowledge and training materials that you will need to navigate and analyse the wealth of data available to you.
We suggest that you go through this documentation step-by-step so that you become familiar with the data and the analysis tools within the Research Environment.
What is the Research Environment?
All analysis on the Genomics England dataset will be done within a secure workspace called the Research Environment. This avoids any potential security breaches that may occur when using data, but does require the environment to be somewhat restrictive to enforce such protection. You will always have access to the most up-to-date research dataset available to you in the Research Environment and will be able to run your analyses on the Genomics England high-performance compute cluster.
Follow these points to help you on your research journey!
- The Research Environment contains the most up-to-date research dataset that you are entitled to.
- The latest genomic and clinical data are released into the Research Environment on a four monthly cycle.
- The Research Environment is a linux virtual desktop environment hosted by Inuvika.
- There is no internet access inside the Research Environment; though some websites are whitelisted.
- You can't copy out of the Research Environment, but you can paste in using the clipboard application.
- If you want to bring data in or take your results out of the Research Environment, you have to use the File Transfer application.
- You will have access to the Genomics England high-performance compute cluster free of charge.
- Command-line tools are available for you to perform analysis and new software can be installed on request.
- You will be able install your own R and Bioconductor packages.
- You will have a working space of several petabytes in size within your shared working folder.
Research Environment Useful Links
|File Transfer Application||https://airlock-staging.extge.co.uk/ (Staging, but used for all imports to both Staging AND Production),|
https://airlock.extge.co.uk (Production, exports only)
|Genomics England Service Desk||http://bit.ly/ge-servicedesk (available only outside the RE)|
Data in the Research Environment
Below is a summary of the latest Main Programme data release for May 2021 (Version 12).
Aggregated variant calls (aggV2)
As part of the Main Programme V10 data release, we make available an aggregate multi-sample VCF (aggV2) comprising 78,195 germline genomes from the 100,000 Genomes Project on GRCh38. We also provide functional annotation files for all variants, variant and sample quality control (QC) metrics, inferred sample relatedness information, Principal Components, and inferred ancestry information for all samples in aggV2. The aggregate_gvcf_sample_stats table LabKey contains all sample-level information for participants in aggV2.
Please read the full documentation here: Aggregated Variant Calls (aggV2).
If you should encounter a problem or need assistance while using the Research Environment, please feel free to submit a ticket to the Genomics England Service Desk and we will get back to you.