This section contains example scripts and code chunks as well as entire analysis workflows to get you up-and-running with analysis in the Research Environment.
These scripts and workflows are present in the Research Environment under the folder: /gel_data_resources/example_scripts
Please feel free to get in contact with us if you would like further assistance in writing your own scripts or if you have a script or workflow of your own that you would like to share with the research community!
External links on this page can only be accessed from outside the RE
Available Scripts and Workflows
- Annotate Variants with VEP (Variant Effect Predictor)
- Creating a Brief Phenotypic Report from Labkey
- Extract variants by coordinate
- Gene centric SNV report for cancer participants [BETA]
- GWAS using aggV2
- Preassembled cohorts
- Somatic SVs and CNVs for a specific gene
- Using the LabKey API
- Aggregate Variant Testing (AVT) workflow
Genome Analysis Toolkit
The below tools are available within the Research Environment and serve as the core tools for manipulating and analysing genomic data (BAM and VCF files). A full list of available software can be found under: Software Available on the HPC or can be shown within the Research Environment by typing 'module avail' once connected to the HPC.
|bcftools||BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.||https://samtools.github.io/bcftools/bcftools.html|
|vcftools||VCFtools is a program package designed for working with VCF files. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.||http://vcftools.sourceforge.net/|
|Samtools is a set of utilities that manipulate alignments in the BAM format. It imports from and exports to the SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows to retrieve reads in any regions swiftly.||http://www.htslib.org/doc/samtools.html|
|bedtools||Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.||https://bedtools.readthedocs.io/en/latest/#|
|Variant Effect Predictor (VEP)||VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.||https://www.ensembl.org/info/docs/tools/vep/index.html|
|ANNOVAR||ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others).||http://annovar.openbioinformatics.org/en/latest/|
|Integrative Genomics Viewer (IGV)||The Integrative Genomics Viewer (IGV) is a high-performance visualisation tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.||https://software.broadinstitute.org/software/igv/|
|PLINK||PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualisation, annotation and storage of results.||http://zzz.bwh.harvard.edu/plink/|
|Picard Tools||Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF specification.||https://broadinstitute.github.io/picard/|
|vt||A tool set for short variant discovery in genetic sequence data.||https://genome.sph.umich.edu/wiki/Vt|
Remember to submit all code as a job on the High Performance Computing cluster.