Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Throughout the time that Genomics England has made the Tiering data available we have received various questions related specifically to this. We therefore outlined a few of these below along with some answers to these questions. We hope that this page proves itself useful and in case you have more questions, please raise a ticket at the Genomics England Service Desk. 


(question)(question) How do we decide what variant falls under denovo_segregation within the Tiering Data?

(green star)(green star) Variants are only marked as denovo when both parental genomes were available in the family-wide variant calling and where the respective variants were not present in either of these genomes and only observed in the off-spring.

...

One point of note: In case a relative has withdrawn their consent, the data may not be fully available. Please reach out to the Genomics England Service Desk to verify whether a participant has been withdrawn.


(question)(question) During the Genomics England Tiering process, variants are filtered for rare variants prior to being permitted to the tiering system. What frequency thresholds are used to determine the "Rare Variant" status? 

(green star)(green star) During the Tiering process, variants (which have been previously called, normalised and annotated by the Rare Disease Interpretation Pipeline) pass through multiple filters (allele frequency, consequence type, segregation, quality etc.) in order to classify those that are potentially relevant/causal for a specific case and disease. The allele frequency mentioned here refers to the frequency in control populations. These allele frequencies and populations can be found in the Rare Disease Results guide under section 5.3.4 Tiering Algorithm Criteria, criterion 2 (page 14).


(question)(question) I observed some duplicate variants within the same participant? What does this mean?

(green star)(green star) Generally speaking, there should be just one row for a given variant. However, some rows or variants may appear twice in the table due to data wrangling the tables from JSON (in our database) to flat file (for LabKey and general usage). However, some variants can be in different impact categories for instance or modes of inheritance which in turn causes the partial "duplication" of the rows.


(question)(question) If a variant is categorised as CompoundHeterozygous segregation (and heterozygous), how do we know which the "second" variant of the compound heterozygous pair is?

(green star)(green star) Compound heterozygotes are determined within the space of the protein coding region of a gene (and potentially around splicing sites). When this occurs you should see at least one other variant within that gene also being listed as CompoundHeterozygous. It is possible that you will see >2 compound heterozygotes in a gene which can then be considered as one group. Unfortunately, it is unknown whether how each of the alleles affect each other and the disease type so we cannot provide more detail beyond the groups of compound heterozygotes.


(question)(question) Why are genes or variants listed in multiple tiers? And how do I know which one is correct? 

(green star)(green star) Similar to the question above, variants can occur as multiple rows for a single participant. One of the reasons for this can be that the gene is part of multiple tiers, and each variant by tier is displayed on a different row. Therefore, variants can be listed with multiple tiers, and effectively both are correct. For the tiering process, each phenotype will have various gene panels applied to them. Depending on the gene panels that are being applied, some genes may be present and some may not.

...

This does raise the question as to which Tier relates to which panel as in some cases a single phenotype can have up to nine panels applied to it. Unfortunately we do not provide that information in the tiering_data table, but this may change in the future  though this is not formally on the horizon. Because of this gap, we either use both tiers or select the most impactful tier depending on the analysis that we perform as ultimately the specific gene panels do relate to the phenotype of the individual. 


(question)(question) What parameters have been applied for the family-based variant calling which is being used by the tiering process 

(green star)(green star) Our tiering process uses VCF's that have been generated by the platypus pipeline performing family-based variant calling. While we prefer are unable to not make provide these run options directly publicpublicly available, you are able to retrieve them within the Research Environment, or by raising a Genomics England Service Desk request. While the run options between the different cohorts and family are essentially the same (beyond different reference genomes), the approach below is aimed at retrieving the options for a single family.

...