External links on this page can only be accessed from outside the RE
This tutorial uses Python LabKey API to query the clinical and phenotype data. Please see Python LabKey API guide to find out additional information on the Python LabKey API.
- You will need to create and enter your credentials into your .netrc profile.
Loading the LabKey module in Python
IVD - loading Python
On Inuvika Virtual Desktop machines, the Python LabKey module is installed for both Python 2 (version 2.7.12) and Python 3 (versions 3.6.5 and 3.8.1).
To load Python, open a terminal and type the following:
HPC Helix - loading Python
On Helix, Python is managed using conda environments - please find more details at: HPC (Helix) Migration 2020
In particular, for Python 3, the Labkey API is accessible for example from the py3pypi environment, so you can load Python like this:
Both systems - loading the module for Labkey
After that, on both IVD and Helix, you can load the LabKey API in Python by importing it like any other Python module. Open Python in the terminal (or in your script) and type:
All query functions within the Python LabKey module (we will use
execute_sql) will by default extract no more than 100,000 rows from any given LabKey table, rather than the whole table.
All such query functions have an optional parameter called
max_rows that sets the maximum number of rows that can be retrieved by that query, and that can be set to any integer value. The default value for this parameter is "
None", which translates into 100,000 rows being retrieved.
There is no standard way to retrieve all rows in a large table using this version of the API - if you believe that you will retrieve more than 100,000 rows, or you are unsure, you must set the optional parameter
max_rows to an integer value that is large enough for your use case.
Please see the examples section below, however remember that the value you need to set
max_rows to will depend on the size of the table and on your query.
You are now ready to use Python and the LabKey API.
Types of queries
Fetching an entire table from LabKey
The following code outlines how to fetch an entire table from LabKey and store it in a Pandas dataframe - note the use of the
max_rows parameter (see above), which is essential here as the
sequencing_report table has 107,623 rows in Data Release 8.
Fetching a specific part of a table from LabKey
The following code outlines how to filter a table by columns and/or rows in order to return a subset of a table.
Fetching data spread across multiple tables in LabKey
The previous code is good for when all the data you want is stored in one table. Often this is not the case. The following code will show how to get data that is spread across multiple tables in LabKey.
Until now we have been using the
select_rows function in LabKey. We could continue to do this for this example, but then we would have to make a call for each table we want to select data from, applying the filters we want and joining the resulting data returned into one big dataframe.
An easier way to achieve this is with the
execute_sql function in LabKey. With this, we can write a block of sql code that LabKey can execute, which can combine all of the above steps into one step.
In the following code, we will get a table of all the people in the cancer programme who are numbered between 6000 and 6200.
Potential error upon an incorrect setup ("DOCTYPE"-error)
In case the LabKey API is incorrectly setup, please have a look here:
- LabKey DOCTYPE error
If this is not the issue, but you still have problems using the LabKey API, please contact the Service Desk.