Page tree
Skip to end of metadata
Go to start of metadata


The cancer_staging_consolidated LabKey table compiles in one place all the staging information available in the Genomics England research environment. It contains the subset of participants from the cancer programme who have successfully passed through the Genomics England interpretation pipeline (and are available in the cancer_analysis LabKey table) for whom at least one piece of staging information is found.

Please note that cancer_staging_consolidated contains no new staging information, i.e. no information that is not already in other LabKey tables.


Staging information is located in the Research Environment in the following three datasets:

  • cancer_participant_tumour (Genomic England primary clinical data)
  • av_tumour (secondary clinical data from PHE-NCRAS)
  • sact (secondary clinical data from PHE)

These datasets have different levels of completion for the staging information. In addition, all tables on LabKey are linked via participant_id, which in the case of cancer staging data is not sufficient, since one participant can have multiple tumours and stage will evolve with time. In order to make staging information easily accessible to the users, we have put together, in a single table, the staging information found on the datasets above for each tumour sample in cancer_analysis.

Tumour_id made it possible to link samples with our primary clinical data; however, not all samples had a tumour_id available. In these cases, as well as for the secondary clinical data, samples have been linked using a dictionary that correlates ICD-10 codes found in the clinical data and disease_type of cancer_analysis. The dictionary was create internally and validated by one of our pathologists.

Finally, we only include staging information in the cancer_staging_consolidated table when the available clinical stage information has been collected no more than one year (12 months) from the date when the tumour sample was collected. Users that would like to use a smaller window can do so by filtering on column "interval_min" of cancer_staging_consolidated table (please note that the interval_min is counted in days).  If for a tumour sample there are multiple staging information available within the one year window, only one entry per source dataset (cancer_participant_tumour, av_tumour,sact) will be includedthe staging information that was obtained closer to the date when the tumour sample was collected. For sact data, we link samples using participant_id and disease_type and use the starting date of regimen; If there is a match (via disease_type), we ensure that the starting date of regimen and the date the tumour sample was taken are no more than one year apart.



This information can be found in LabKey under a tabled called cancer_staging_consolidated under the Bioinformatics tab. The cancer_staging_consolidated  table connects to other tables in LabKey via the participant_id. In addition, tumour identifiers from different sources, i.e. tumour_id (Genomics England), av_tumour_pseudo_id (PHE-NCRAS) and sact_tumour_pseudo_id (PHE) are given to identify the specific tumour. 

Table Schema

The cancer_staging_consolidated table contains the following entries:

from cancer_analysis:

  • participant_id
  • tumour_sample_platekey
  • tumour_id
  • disease_type
  • tumour_type
  • tumour_clinical_sample_time

from cancer_participant_tumour:

  • diagnosis_date
  • diagnosis_icd_code
  • integrated_tnm_stage_grouping
  • component_tnm_t
  • component_tnm_n
  • component_tnm_m
  • ajcc_stage
  • final_figo_stage
  • modified_dukes_stage

from av_tumour:

  • av_tumour_pseudo_id
  • diagnosisdatebest
  • site_icd10_o2
  • stage_best
  • t_best
  • n_best
  • m_best
  • stage_best_system
  • stage_path
  • t_path
  • n_path
  • m_path
  • stage_img
  • t_img
  • n_img
  • m_img
  • dukes
  • figo
  • gleason_primary
  • gleason_combined
  • grade
  • behaviour_coded_desc
  • histology_coded_desc
  • er_status
  • pr_status
  • her2_status
  • npi

from sact:

  • sact_tumour_pseudo_id
  • primary_diagnosis
  • start_date_of_regimen
  • stage_at_start


  • interval_min

Main Programme Statistics

The statistics for the cancer_staging_consolidated table for Main Programme V8 data release can be found here: Main Programme V8 Statistics (28-11-2019)

The statistics for the cancer_staging_consolidated table for Main Programme V9 data release can be found here: Main Programme V9 Statistics (02-04-2020)

The statistics for the cancer_staging_consolidated table for Main Programme V10 data release can be found here: Main Programme V10 Statistics (03-09-2020)

The statistics for the cancer_staging_consolidated table for Main Programme V11 data release can be found here: Main Programme V11 Statistics (17-12-2020)

The statistics for the cancer_staging_consolidated table for Main Programme V12 data release can be found here: Main Programme V12 Statistics (06-05-2021)



This table has been included for the first time in data release 8. If you have suggestions about this table or would like to request edits that would be useful for your analyses, please let us know by contacting us via the Genomics England Service Desk:

  • No labels