Along with the genomic and clinical data provided, we also make available useful outputs from our internal bioinformatics pipeline and analyses here at Genomics England. These data are provided in a variety of formats and are found in different locations within the Research Environment. This page will help to explain the Genomics England data and show you how to use these data to maximise your research.
Understanding the data structure
The Genomics England data are deposited in the Research Environment and made accessible via either the gel_data_resources folder or through the LabKey application.
The gel_data_resources folder
The gel_data_resources folder is accessible via the Desktop environment by navigating to Home and clicking on the folder called gel_data_resources. It is also accessible when on the HPC environment under root (/gel_data_resources). This folder is read-only. Within this folder are outputs from our internal bioinformatics pipeline and analyses here at Genomics England. These comprise mainly larger files (such as VCFs and JSONs) which cannot be displayed easily in LabKey. In most cases however, the data deposited in gel_data_resources are linked back to the participant-level data via LabKey; where the full paths to the files are documented.
The gel_data_resources folder follows a set folder structure. On the top-level, data are firstly divided into Main Programme and Pilot. After this, the data are categorised as follows for the Main Programme.
For example, the aggregated VCF in GRCh38 using Platypus calls for the October 2018 Main Programme data release are found here:
Data from our internal bioinformatics pipeline and analyses here at Genomics England which can be easily displayed in a tabulated format are made available through the LabKey application. This can be accessed via the Desktop application.