External links on this page can only be accessed from outside the RE
This tutorial uses R LabKey API to query the clinical and phenotype data. Please see Rlabkey Package guide to find out additional information on the R LabKey API.
- You will firstly need to install and load the Rlabkey package into your R session by following this guide. We used Rlabkey version 2.2.1 in these examples.
- You will need to create and enter your credentials into your .netrc profile.
Types of queries
Fetching an entire table
Fetching an entire LabKey table is easy. You will only need to change the name of the main programme release version and the name of the the LabKey table you need. You can get the release versions by navigating to the LabKey Application and selecting the Main Programme project - the names of the releases are detailed in the subsequent folder. The names of each table can be seen in the data views when you click on a release folder.
The code below loads the entire sequencing_report table and saves it as an R data.frame. The function labkey.selectRows is used to do this.
Fetching a table and filtering
The function labkey.selectRows can also include additional arguments to select specific columns (colSelect), filter for specified row values (colFilter), and sort by columns (colSort).
The code below loads and subsets the participant table for rare disease probands, born between 1990 and 2010. It then selects only the participant id, family id, and year of birth column, and sorts by year of birth. You can see the full range of R LabKey filters here.
Fetching a table using SQL
Often, you will need to merge data from multiple tables using a common key (such as the unique participant id). The easiest, safest, and most reproducible way to do this is by using the labkey.executeSql function. We highly recommend using this function to perform more complex analysis across multiple tables.
The following code will use the labkey.executeSql function to perform a complex query which:
- The participant id and HPO term columns from the 'rare diseases participant phenotype' table
- The year of birth column from the 'participant' table
- The file path column to the genomic data from the 'genome file paths and types' table
- Left joins
- By the unique 'participant id' common to all tables
- For positive diagnoses (hpo present = YES) in the 'rare diseases participant phenotype' table
- For two HPO terms (hpo id = HP:0001249 and HP:0000365) in the 'rare diseases participant phenotype' table
- For small variant VCF file paths (file sub type = standard VCF) in the 'genome file paths and types' table
- For genomes aligned to GRCh38 (genome build = 38) in the 'genome file paths and types' table
- By year of birth
Querying the Hospital Episode Statistics tables
There are two 'Medical History' tables within the Main Programme LabKey projects which give diagnoses information encoded by the ICD-10 medical classification. These are the Admitted Patient Care and Outpatient Care tables which are named as 'hes_apc' and 'hes_op' respectively in LabKey (hes = Hospital Episode Statistics). Though these tables are very large, the information containing the ICD-10 terms can be found in the columns prefixed by 'diag'. For example 'diag_01', which is the primary diagnosis, and 'diag_02' and so on. The hes_apc table has 20 diag columns whereas the hes_op table has 12 diag columns. We can retrieve these columns using the labkey.executeSql function as shown below.
The following code will use the labkey.executeSql function to retrieve all of the ICD-10 columns for each participant from both the hes_apc and hes_op tables. It will also retrieve the time of the participant's visit and their participant ID which you can use to link back to the other tables.