This aggregate dataset contains information on a subset of participants who have since been withdrawn from research. Their use in any new analyses is not permitted. Thus, it is extremely important to remove these samples from your analyses an ensure that you are only using samples included in the latest data release.
The list of samples for the consented participants can be found in the 'aggregate_gvcf_sample_stats' table in the labkey, for the latest data release.
For the main programme version 14 data release, the list of consented samples are detailed in the file main_programme_v14_samples.txt, located in the folder /gel_data_resources/main_programme/aggregation/aggregate_gVCF_strelka/aggV2/docs/
To filter the aggregate to these samples, all bcftools commands should include the flag -S /gel_data_resources/main_programme/aggregation/aggregate_gVCF_strelka/aggV2/docs/main_programme_v14_samples.txt
Submit a ticket to the Genomics England Service desk if you are unsure of how to filter the dataset for any other use.
Due to a probable bug in BCFtools, site QC statistics for Chrom X are incorrect. We advise avoiding the use of FILTER and INFO field data until this can be corrected. All genotype data and the related VEP functional data are unaffected.
Overview
The sections below document elements of the aggV2 dataset generation and presentation in more detail:
- Sample QC
- gVCF Aggregation
- Variant Normalisation
- Variant Representation
- Site QC, FILTER and INFO Fields
- Functional Annotation
The sections below discuss datasets that accompany the aggV2 release (PCs, relatedness, ancestry inference, allele frequencies):
Help & Support
Help with aggV2