Page tree
Skip to end of metadata
Go to start of metadata

External links on this page can only be accessed from outside the RE.

For further information on the data available in the latest data release, including a data dictionary, please visit the most recent Main Programme data release note.

Clinical Data Overview

Available clinical data in the Genomics England research environment are divided into primary and secondary, depending on whether they were acquired specifically for the 100,000 Genomes Project or not. Primary clinical data are sourced from the Genomic Medicine Centres (GMCs) for all participants upon enrollment in the program, according to set data models that specify the variables and matching data types. Secondary clinical data come from third parties such as Public Health England (PHE) or NHSD, and complement the primary clinical data with additional information such as hospital visits and cancer treatments.

Not all variables are compulsory and some will be personal identifiable data, so are not present in the de-identified data within the Research Environment.

Clinical and phenotype data can be accessed via LabKey, which is available in the Inuvika Research Environment. For information on how to programmatically query LabKey, check our examples codes.

In either case, all datasets can be accesses via:

home > main_programme > main_programme_vX

where X should be the newest version of the data release, or the version you are interested in. Once you are inside the version you are interested, you will be able to see a list of 'Data Views' (datasets) divided into 5 categories: 'Quick View', 'Common', 'Rare Disease', 'Cancer', which comprise our primary clinical data collected for Genomics England, and 'Medical History', which contains dataset with secondary data which is received from third parties. The same datasets are also listed under 'Lists', which has some extra functionalities.

Quick view (primary clinical data)

Contains summary tables for cancer and rare disease samples.

  • cancer_analysis contains all tumour samples that have been sequenced, had variants called and successfully passed through the Genomics England interpretation pipeline. For each tumour sample, you have their matched germline information, as well as information about the tumour, sequencing quality control metrics, tumour mutational burden (somatic_coding_variants_per_mb), signatures and path to bam and vcf files. Note that, one participant may have more than one tumour sample. For more information on the how this dataset is generated see here
  • rare_disease_analysis contains the latest sample for each participant that have been sequenced, had variants called and successfully passed through our interpretation pipeline. Samples are uniquely identified by their platekey number. Note that, one participant may have more than one sample, one built for genome GRCh37 and GRCh38.

For more information on how the sequencing quality control metrics are calculated, see our technical documentation, 'Rare Disease Results Guide' and 'Cancer Analysis Technical Information Document', on 10. Further reading and documentation.

Common view (primary clinical data)

Contains tables with participants from both cancer and rare disease programmes, and give information about biological sample handling, genome sequencing, participant data and domain assignment.

Tables about biological sample handling:

  • clinic_sample describes the taking and handling of participant samples at the Genomic Medicine Centres, i.e. in the clinic, as well as the type of samples obtained. Because of the complexities of handling and managing tumour tissues samples in a clinical setting, there are many fields that are cancer-specific.

  • clinic_sample_quality_check_result describes the quality control of obtaining and handling participant samples at the Genomic Medicine Centres, i.e. in the clinic.

  • laboratory_sample describes the handling of samples at the biorepository and in preparation for sequencing, as well as the type of biological sample.
  • laboratory_sample_omics_availability contains information on other biological samples that are available in our biobank for our participants as of the latest data release. Please note that these samples have not been sequenced nor analysed by Genomics England.

Tables about sequencing data:

  • plated_sample contains, for each sequenced sample, the plate key and plate id, along with Illumina QC date, status and few other QC information.

Tables about participant, including domain assignment:

  • death_details contains participant deaths submitted by GMCs, likely less complete than the data collected by ONS and NHSD.
  • participant contains information on all participants that have been recruited up to that data release. Data are non-identifiable and includes demographics (such as relatives or ethnicity); points of contact with the Project (e.g. handling Genomic Medicine Centre or Trust); and a record of the status of their clinical review. Note that some participants may not have been sequenced yet.

Bioinformatics view (genomic and Genomics England data)

Contains tables with information that is not primary or secondary clinical data; i.e. datasets about genomics and Genomics England interpretation pipeline data for participants from both cancer and rare disease programmes.

Tables about sequencing data and their variants:

  • aggregate_gvcf_sample_stats contains the samples that have been used to create the aggregate vcf files (/gel_data_resources/main_programme/aggregated_illumina_gvcf /GRCH38/20190228/) and their QC metrics. These files contain the aggregated variant calls from Illumina.
  • denovo_cohort_information contains cohort information for all participants included in the de novo variant (DNV) research dataset. Attributes within this table include: participant ID, sex, affection status, family ID, pedigree ID, and the path to each family's multi-sample VCF with flagged DNVs. 
  • denovo_flagged_variants contains all variants that pass base_filter for all trios within the DNV dataset. The table does not include variants that fail the base_filter due to size restrictions, but these can be found in the annotated multi-sample VCFs. This table includes all flags from the DNV annotation pipeline for each variant. 
  • genome_file_paths_and_types contains the folder location for the bam and vcf files for each participant.
  • sequencing_report contains, for each participant in the 100,000 Genomes Project, data describing the sequencing of their genome(s) and associated output, as well as the sample type that the sequence is from, e.g. rare disease germline, cancer somatic, etc.

Tables with data output from the Genomics England interpretation pipeline, including domain assignment:

  • domain_assignment contains, for each participant in the 100,000 Genomes Project, data describing the disease type to which they were recruited; the disease panel applied to their genome; the GeCIP domain to which their genome has been assigned for the purposes of administering the GeCIP publication moratorium; as well as the end date of the GeCIP moratorium associated with their genome(s).
  • exomiser contains, for each participant of the Rare Disease programme, the results of the Exomiser variant prioritisation framework. More information on how the data are generated here
  • gmc_exit_questionnaire contains, for each family with a closed case, information extracted from the GMC exit questionnaireData reporting back from the Genomic Medicine Centres, for variants reported to them by Genomics England, to what extent a family’s presenting case can be explained by the combined variants reported to them (including any segregation testing performed); confidence in the identification and pathogenicity of each variant; and the clinical validity of each variant or variant pair in general and clinical utility in a specific case (only the most recent update will be shown and only one questionnaire per report).
  • panels_applied contains, for each participant of the 100,000 Genomes Project, the name and version of the panel(s) that was applied to his or her genome.
  • tiered_variants_frequency contains, for each pathogenic variant found on the Genomics England rare disease database, information of the variant consequence, as well as annotation results from GNOMAD and 1000 genomes.
  • tiered_data describes, for each rare disease participant of the 100,000 Genomes Project who has been through the Genomics England interpretation pipeline and each tiered variant found for each of these participants, the consequences of the variant and few other genetic information. More information on tiering here.

Tables that consolidate different datasets with clinical information:

  • cancer_staging_consolidated combines staging information from our primary clinical data (cancer_participant_tumour) and secondary clinical data from PHE/NCRAS (sact and av_tumour) to give a stage for each sample we have sequenced and fully interpreted on our database (cancer_analysis). The staging information may be in form of TNM combined, each component or other standards such as ajcc, or dukes, for example. The genomic data are matched to the clinical data using a disease type (genomic data) and icd code (clinical data) correspondence dictionary created and validated internally. Also, the clinical stage information must not be further away than one year from the date the sample has been collected. Note that, the column names have been preserved as found in the original datasets they were extracted from, except for tumour_pseudo_id found both in sact and av_tumour, where a prefix with the dataset names was added to. More information on the staging consolidating here.

Cancer (primary clinical data)

Cancer data are presented at the participant level or sample level. All tumour samples have a matched germline sample. One participant might have more than one tumour sample, which, in such a case, could be related to temporal samples, two different tumours or, rarely, biological replicates. The latter is often part of the TracerX which is not available to commercial users.

Tables with data relating to cancer participants:

  • cancer_care_plan contains, for a proportion of cancer participants in the 100,000 Genomes Project, information from their NHS cancer care plan on their treatment and care intent, in particular outcomes of MDT meetings and coded connected data (e.g. diagnoses from scans).
  • cancer_invest_imaging contains, for a proportion of cancer participants in the 100,000 Genomes Project, coded data on imaging investigations characterising the scan, its modality, anatomical site and outcome; as well as the outcome of the imaging report in free text form.
  • cancer_participant_disease includes, for each cancer participant in the 100,000 Genomes Project, data about their cancer disease type and subtype.
  • cancer_participant_tumour contains, for each cancer participant’s tumour in the 100,000 Genomes Project, this table contains data that characterises the tumour, e.g. staging and grading; morphology and location; recurrence at time of enrolment; and the basis of diagnosis.
  • cancer_participant_tumour_ metastatic_site contains, if applicable, for cancer participants in the 100,000 Genomes Project, the site of their metastatic disease in the body at diagnosis.
  • cancer_risk_factor_ cancer_specific contains, for a proportion of cancer participants in the 100,000 Genomes Project, data on specific risk factors related to particular cancer types. This table was compiled with input from GeCIP members.
  • cancer_risk_factor_general contains, for a proportion of cancer participants in the 100,000 Genomes Project, data on general cancer risk factors, namely smoking status, height, weight and alcohol consumption. This table was compiled with input from GeCIP members.
  • cancer_surgery contains, for a proportion of cancer participants in the 100,000 Genomes Project, details of what surgical procedures were had, as well as the specific location of the intervention.

Tables with data derived from or relating to tumour samples:

  • cancer_invest_circulating _tumour_marker contains, for a proportion tumours from cancer participants in the 100,000 Genomes Project, biomarker measurements specific to particular cancer types (ovarian or prostate).
  • cancer_invest_sample_ pathology contains, for a proportion of cancer participants in the 100,000 Genomes Project, full pathology reports and other related data on and from their tumour samples around diagnosis and characterisation of the cancer. Please note that much of this information is also found in the clinic_sample and cancer_participant_tumour tables.
  • cancer_specific_ pathology contains, for a proportion tumours from cancer participants in the 100,000 Genomes Project, pathology data specific to that participant’s cancer type. This may provide additional data to the cancer_invest_sample_pathology and cancer_participant_tumour tables.
  • cancer_systemic_ anti_cancer_therapy contains, for a proportion tumours from cancer participants in the 100,000 Genomes Project, details the regimen and intent of the participants’ chemotherapy.

Rare Diseases (primary clinical data)

Rare Disease data are presented at the level of rare disease families (families of probands), rare disease pedigrees and participants. Participants are individuals who have consented to be a part of the project with the expectation that a sample of their DNA will be obtained and their genome sequenced. Participants can be proband or relatives. Probands are the affected individuals that started the participation of that family into the programme, and for who most of the analyses are done. Relatives are other participants that may or may not be affected. Pedigree members are extended members of the proband’s family, which will include some participants (relatives) as well as a number of other individuals who will have no contact with the project, have not consented, but for whom a small amount of data are recorded to allow a full picture of the proband’s extended family to be gathered.

All rare disease tables are prefixed by “rare_diseases_” at the beginning of the table name.

Tables with data at the level of rare disease families:

  • rare_diseases_family describes the families of rare disease probands participating in the 100,000 Genomes Project, making family members participants of the Project. It includes the family group type, the status of the family’s pre-interpretation clinical  review and the settings that were chosen for the interpretation pipeline at the clinical review.
  • rare_diseases_pedigree describes the Rare Disease participants, linking pedigrees to probands and their family members.
  • rare_diseases_pedigree_member describes the Rare Disease pedigree members, similar to the data about each individual participant in the common data view. It includes some additional data, such as the age of onset of predominant clinical features, data on links to other family members, as well as data collected only for phenotypes.

Tables with data at the level of rare disease participants:

  • rare_diseases_ early_childhood_observation contains, for rare disease participants in the 100,000 Genomes Project, measurements and milestones provided by the GMCs, related to childhood development.
  • rare_diseases_ gen_measurement contains, for rare disease participants in the 100,000 Genomes Project, general measurements relevant to the disease, alongside the date that the measurements were taken on.
  • rare_diseases_imaging contains, for rare disease participants in the 100,000 Genomes Project, various data and measurements from past scans, alongside the date of the scans.
  • rare_diseases_ invest_blood_laboratory_ test_report contains, for a proportion of rare disease participants in the 100,000 Genomes Project, the results of any blood tests carried out. Over 400 blood values are recorded alongside type and technique of testing and the status of the participating patient in the care pathway.
  • rare_diseases_ invest_genetic contains, for a proportion of rare disease participants in the 100,000 Genomes Project, information on any genetic tests carried out. Data characterising the genetic investigation is recorded alongside records of the sample tissue source and the type of testing laboratory.
  • rare_diseases_ invest_genetic_test_result contains, for a proportion of rare disease participants in the 100,000 Genomes Project, the results of any genetic tests carried out. Following on from the rare_diseases_invest_genetic table, a summary of the results is presented and contextualised by testing method and scope.
  • rare_diseases_ participant_disease describes the proband's rare diseases. This is as for rare_diease_pedigree_member, with the addition of a date of diagnosis.
  • rare_diseases_ participant_phenotype describes, for each rare disease participant in the 100,000 Genomes Project, their phenotype(s). The phenotypic abnormality are defined as whether an HPO term that is present or not and what the HPO term is, as well as the age of onset, the severity of manifestation, the spatial pattern in the body and whether it is progressive or not.

Secondary Data - NHSD (clinical data)

Data from the third party NHS Digital describing rare diseases and cancer participants' medical history and death, when applicable.

Tables with hospital visits data:

Hospital Episodes Statistics (HES) contains details of all admissions, outpatient appointments, critical care and A&E attendances at NHS hospitals in England. Each data entry is collected during a patient's time in hospital and are submitted to allow hospitals to be paid for the care they deliver. HES data are designed to enable secondary use, that is use for non-clinical purposes, of these administrative data. It is a records-based system that covers all NHS trusts in England, including acute hospitals, primary care trusts and mental health trusts. HES information is stored as a large collection of separate records and Genomics England receives regular partial exports of HES data held for each of the participants within the 100,000 Genomes Project, which are linked with their Participant ID. HES data are presented in LabKey as separate datasets:

  • ae (accident and emergency) contains historic records of A&E attendances of Genomics England main programme participants.
  • apc (admitted patient care) contains historic records of admissions into secondary care of Genomics England main programme participants.
  • cc (critical care) contains historic records of admissions into critical care of Genomics England main programme participants.
  • op (outpatient) contains historic records of outpatient attendances of Genomics England main programme participants.

The HES data are presented in LabKey with each row representing a separate period of care for that participant. Therefore, each participant may have one or more rows of data. By scrolling through the data, it may look like there is a large proportion of missing data; however, this is not the case and it is merely a reflection of the way the dataset is structured. For variables in which there might be multiple datapoints for a single episode, for example a single visit to A&E may result in multiple diagnoses, the separate data-points are split across multiple columns. In the A&E diagnosis example, the first twelve diagnoses in a single A&E episode are recorded across twelve columns named DIAG_01 through to DIAG_12 (as A&E diagnosis codes), despite the vast majority of visits resulting in only one or two diagnoses.

Other medical history tables:

  • cen (cohort event notification) informs, for Genomics England main programme participants, if the participant has Deceased, cancelled cypher or other event related to programme participation.
  • did contains historic diagnostic imaging records of Genomics England main program participants.
  • did_bridge links file of participants to DID records.
  • mh_bridge links file of participants to MHMD records.
  • mhldds_episode contains historic records of mental health (MH) related admissions of Genomics England main programme participants. Episode and event tables link to the records table via mhm_mhmds_spell_id.

  • mhldds_event contains historic records of MH related admissions of GeL main programme participants. Episode and event table link to the records table via mhm_mhmds_spell_id.
  • mhldds_record contains historic records of MH related admissions of GeL main programme participants. One record per spell per patient in a provider. 
  • mhmd_v4_episode contains historic records of MH related admissions of Genomics England main programme participants.
  • mhmd_v4_event contains historic records of MH related admissions of Genomics England main programme participants.
  • mhmd_v4_record contains historic records of MH related admissions of Genomics England main programme participants.
  • ons lists the Office of National Statistics’ cause of death records for the Genomics England main programme participants.
  • proms (Patient Reporting Outcome Measures) reports cases of diabetes, depressions and other conditions for the Genomics England main programme participants.

Note that mental health (MH) data is split in mental health and learning disabilities dataset (mhldds_) and mental health minimum data (mhmd_). The former replaced the later in April 2014 to include learning disabilities. Thus, events that happened until March 2014 are found in mhmd_ and after April 2014 in mhldds. 

Secondary Data - PHE-NCRAS (cancer clinical data)

Data from the third party Public Health England (PHE), including data from the National Cancer Registration and Analysis Service (NCRAS), describing cancer patients' medical history. The NCRAS is run by PHE, and is responsible for cancer registration in England to support cancer epidemiology, public health, service monitoring and research.

Cancer Registration (AV) is the systematic collection of data about cancer and tumour diseases. In England, this data collection is managed by NCRAS. Every year, NCRAS collects information on over 300,000 cases of cancer, including patient details (including their name, address, age, sex, and date of birth), as well as detailed data about the type of cancer, how advanced it is and the treatment the patient receives. At Genomics England the data are stripped out of identifiable information and associated to a the patient's participant_id so that these data can be linked to other clinical and also the genomic data.

AV tables gather data for patients diagnosed with cancer from 1 January 1995 - 31 December 2017. This dataset brings together data from more than 500 local and regional datasets to build a picture of an individual’s treatment from diagnosis. Please note that tumour_ids in AV tables are assigned to participants by NCRAS and do not link to the tumour_ids assigned by GeL for sequencing and clinical data. Whilst (particularly in the case of single tumour) this may refer to the same cancer, caution should be applied prior to any analysis.

  • av_imd (income deprivation domain) measures the proportion of the population experiencing deprivation relating to low income. The definition of low income used includes both those people that are out-of-work and those that are in work but who have low earnings. 
  • av_patient contains, for each cancer participant, demographics from the Cancer Registration and information about death, when applicable by the last day of data collection for the AV tables.
  • av_rtd (routes to diagnosis) contains, for each cancer patients diagnosed between 2006 and 2016, one of eight possible route to diagnosis. These routes have been determined using a model that combines AV data with HES data, Cancer Waiting Times (CWT) data and data from the cancer screening programmes. The methodology is described in detail in 2012 British Journal of Cancer article.
  • av_treatment contains treatment received for each participant. Notice that often one participant receives more than one treatment, which includes surgery, chemo, immuno and radiotherapy.
  • av_tumour contains, for each participant, medical information about the tumour, including date of diagnosis, site, morphological and behaviour ICD10 codes as well as histology and grade.

The National Lung Cancer Audit (LUCADA) looks at the care delivered during referral, diagnosis, treatment and outcomes for people diagnosed with lung cancer and mesothelioma. The data items in the LUCADA dataset have been compiled to meet the requirements of audit, and are not to be confused with the data items identified as Lung Cancer in the National Cancer dataset. The audit focuses on measuring the care given to lung cancer patients from diagnosis to the primary treatment package, assessing against standards and bringing about necessary improvements. The project supports the Calman Hine recommendations, the National Cancer Plan and other national guidance (e.g. NICE guidance) as it emerges.

The audit follows patients diagnosed between: 01/01/2005 - 31/12/2013 (the vital status of each patient can be followed up with linkage to Cancer Registration data).

  • lucada_2013 contains, for 56 participants, data on the national lung cancer audit 2013.
  • lucada_2014 contains, for 18 participants, data on the national lung cancer audit 2014.

Other datasets from PHE available for the Genomics England participants are:

  • cwt the National Cancer Waiting Times Monitoring Data Set supports the continued management and monitoring of waiting times.
  • ncras_did (diagnostic imaging dataset) is a central collection of detailed information about diagnostic imaging tests carried out on NHS patients, extracted from local radiology information systems. The DID captures information about referral source, details of the test (type of test and body site), demographic information such as GP registered practice, patient postcode, ethnicity, gender and date of birth, plus data items about different events (date of imaging request, date of imaging, date of reporting, which allows calculation of time intervals. Data are available for patients diagnosed between 1 January 2013 and 31 December 2015.
  • rtds (radiotherapy dataset) is an existing standard (SCCI0111) that has required all NHS Acute Trust providers of radiotherapy services in England to collect and submit standardised data monthly against a nationally defined data set since 2009. The purpose of the standard is to collect consistent and comparable data across all NHS Acute Trust providers of radiotherapy services in England in order to provide intelligence for service planning, commissioning, clinical practice and research and the operational provision of radiotherapy services across England. Data are available from 01/04/2009.
  • sact (systemic anti-cancer therapy) contains clinical management on patients receiving cancer chemotherapy, and newer agents that have anti-cancer effects, in or funded by the NHS in England. It covers chemotherapy treatment for all solid tumour and haematological malignancies and those in clinical trials. It relates to all cancer patients, both adult and paediatric, in acute inpatient, daycase, outpatient settings and delivery in the community. Data available for regimens between 11/09/16-15/12/17 with cycles within ending 15/02/18.

For more information on how the table contents, see the documentation for the newest data release.

Some relevant information found in the datasets

Human Phenotype Ontology (HPO)

The phenotyping of participants within the rare disease arm uses the Human Phenotype Ontology (HPO). The ontology comprises over 10,000 terms for describing phenotypic abnormalities and also contains over 50,000 annotations of HPO terms to hereditary diseases. The terms collectively form a relational network (connected by is-a connections), such that each term is a more specific, or more limited, instance of its parent term - eg abnormality of the foot is-a abnormality of the lower limbs. The presence or absence of particular HPO terms in each of the rare disease participants are given in the HPO dataset, along with modifiers that give further specifics on how that phenotype is manifested in that individual. These modifiers are the laterality, age of onset, progression, severity and spatial pattern. Clicking on the field name in LabKey and selecting 'filter' will give you an idea of the values present for each of these modifiers.

Selection of HPO terms per participant 

Genomics England has developed what we call 'disease data models', a list of HPO terms that define each disease, to aid the process of phenotyping at the GMC end. The list of rare disease data models can be found here, in the Rare Disease Data Models pdf. In addition, for each participant the GMC can also enter additional HPO terms that are not necessarily listed in the clinical data model, if they think it is useful to drive the selection of additional panels. The above will typically comprise the list of HPO terms that have been specifically assessed for each participant . 


A classification of diseases can be defined as a system of categories to which morbid entities are assigned according to established criteria. The purpose of the ICD is to permit systematic recording, analysis, interpretation and comparison of mortality and morbidity data collected in different countries or areas and at different times. The ICD is used to translate diagnoses of diseases and other health problems from words into an alphanumeric code, which permits easy storage, retrieval and analysis of the data. In practice, the ICD has become the international standard diagnostic classification for all general epidemiological and many health-management purposes. These include analysis of the general health situation of population groups and monitoring of the incidence and prevalence of diseases and other health problems in relation to other variables, such as the characteristics and circumstances of the individuals affected. The ICD is neither intended nor suitable for indexing of distinct clinical entities. There are also some constraints on the use of the ICD for studies of financial aspects, such as billing or resource allocation. The ICD can be used to classify diseases and other health problems recorded on many types of health and vital records. Its original use was to classify causes of mortality as recorded at the registration of death. Later, its scope was extended to include diagnoses in morbidity. It is important to note that, although the ICD is primarily designed for the classification of diseases and injuries with a formal diagnosis, not every problem or reason for coming into contact with health services can be categorised in this way. Consequently, the ICD provides for a wide variety of signs, symptoms, abnormal findings, complaints and social circumstances that may stand in place of a diagnosis on health-related records (see Volume 1, Chapters XVIII and XXI). It can therefore be used to classify data recorded under headings such as ‘diagnosis’, ‘reason for admission’, ‘conditions treated’ and ‘reason for consultation’, which appear on a wide variety of health records from which statistics and other health-situation information are derived. 

ICD-10 codes must be used in the manner set forth in Volume 2: Instruction Manual of the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision The user is responsible for ensuring that the codes are properly used in this manner.

For more information on ICD-10, please see the 'International statistical classification of diseases and related health problems (ICD-10)' document on 10. Further reading and documentation.

ICD codes and code descriptions are deposited in the Research Environment under the folder: /gel_data_resources/licenced_resources/ICD10


The International Classification of Diseases for Oncology (ICD-O) is internationally recognised as the definitive classification of neoplasms. It is used by cancer registries throughout the world to record incidence of malignancy and survival rates, and the data produced are used to inform cancer control, research activity, treatment planning and health economics. The classification of neoplasms used in ICD-O links closely to the definitions of neoplasms used in the WHO/IARC Classification of Tumours series, which are compiled by consensus groups of international experts and, as such, the classification is underpinned by the highest level of scientific evidence and opinion.

ICD-O consists of two axes (or coding systems), which together describe the tumour:

  • the topographical code, which describes the anatomical site of origin (or organ system) of the tumour
  • the morphological code, which describes the cell type (or histology) of the tumour, together with the behaviour (malignant or benign).


SNOMED was started in 1965 as a Systematized Nomenclature of Pathology (SNOP) and was further developed into a logic-based health care terminology. SNOMED CT was created in 1999 by the merger, expansion and restructuring of two large-scale terminologies: SNOMED Reference Terminology (SNOMED RT) and the Clinical Terms Version 3 (CTV3) (formerly known as the Read codes), developed by the NHS. 


The TNM Classification of Malignant Tumours (TNM) is a cancer staging notation system that describes the stage of a cancer that originates from a solid tumour with alphanumeric codes.

  • T describes the size of the original (primary) tumour and whether it has invaded nearby tissues
  • N describes nearby (regional) lymph nodesthat are involved
  • M describes distant metastasis.

The code for a particular cancer is made up of these three parts along with other parameters and modifiers.

  • No labels