Page tree
Skip to end of metadata
Go to start of metadata

This aggregate dataset contains information on a subset of participants who have since been withdrawn from research. Their use in any new analyses is not permitted. Thus, it is extremely important to remove these samples from your analyses an ensure that you are only using samples included in the latest data release.

The list of samples for the consented participants can be found in the 'aggregate_gvcf_sample_stats' table in the labkey, for the latest data release.

For the main programme version 14 data release, the list of consented samples are detailed in the file main_programme_v14_samples.txt, located in the folder /gel_data_resources/main_programme/aggregation/aggregate_gVCF_strelka/aggV2/docs/

To filter the aggregate to these samples, all bcftools commands should include the flag -S /gel_data_resources/main_programme/aggregation/aggregate_gVCF_strelka/aggV2/docs/main_programme_v14_samples.txt

Submit a ticket to the Genomics England Service desk if you are unsure of how to filter the dataset for any other use.

Due to a probable bug in BCFtools, site QC statistics for Chrom X are incorrect. We advise avoiding the use of FILTER and INFO field data until this can be corrected. All genotype data and the related VEP functional data are unaffected.

Overview 

The sections below document elements of the aggV2 dataset generation and presentation in more detail: 

The sections below discuss datasets that accompany the aggV2 release (PCs, relatedness, ancestry inference, allele frequencies):

Help & Support

Help with aggV2

Please reach out via the Genomics England Service Desk for any issues related to the aggV2 aggregation or companion datasets, including "aggV2" in the title / description of your inquiry.