ClinVar, re-summarised

Motivation

During the creation of Talos, a tool for identifying clinically relevant variants in large cohorts, we use ClinVar ratings as a contributing factor in determining pathogenicity. During development of this tool we determined that the default summaries generated in ClinVar were highly conservative; see the table here describing the aggregate classification logic.

Content

This repository contains an alternative algorithm (described here) for re-aggregating the individual ClinVar submissions, generating decisions which favour clear assignment of pathogenic/benign ratings instead of defaulting to 'conflicting'. These ratings are not intended as a replacement of ClinVar's own decisions, but may provide value by showing that that though conflicting submissions exist, there is a clear bias towards either benign or pathogenic ratings.

We aim to re-run this process monthly, and publish the resulting files on Zenodo You can download this pre-generated bundle here: https://zenodo.org/records/16792026

Primary Outputs

Hail Table and TSV of all revised decisions
Hail Table and TSV of all Pathogenic missense changes, indexed on Transcript and Codon. This is usable as a PM5 annotation resource.

TSVs

clinvar_decisions.tsv: A tab-separated file with headers, containing our re-summarised ClinVar decisions. Columns:
- contig: the chromosome or contig of the variant
- position: the position of the variant on the contig
- reference: the reference allele at the variant position
- alternate: the alternate allele at the variant position
- clinical_significance: the clinical significance of the variant, as determined by our algorithm
- gold_stars: the number of gold stars assigned to the variant, indicating the quality of the evidence supporting the asserted significance
- allele_id: the unique identifier for the variant in ClinVar, accessible directly via URL like http://www.ncbi.nlm.nih.gov/clinvar?term=XXXXXXX[alleleid], or through ClinVar's web page using an 'advanced search' field
clinvar_decisions.pm5.tsv: A tab-separated file with headers, containing our PM5 missense decisions. All ClinVar entries in this file are Pathogenic Missense changes. Columns:
- transcript: the transcript ID of the gene in which the missense change occurs
- codon: the codon position of the missense change in that transcript
- clinvar_alleles: +-delimited String, each entry being an AlleleID::GoldStars string, where AlleleID is the unique identifier for the ClinVar allele, and GoldStars is the number of stars assigned to that allele. e.g. 12345::3+67890::1, indicating that allele 12345 has 3 stars, and allele 67890 has 1 star, and both affect the same codon in the same transcript.

Usage

Download Results

We aim to generate data monthly, and publish the results on Zenodo. The latest version of the data can be found at:

https://zenodo.org/records/16777475

Local Running

Downloading input files

A NextFlow workflow is provided to run the ClinvArbitration process locally. To use this process you will need reference files:

a reference genome, in FASTA format
a GFF3 file, containing gene annotations for the reference genome
the files containing raw ClinVar submissions and variant details

A directory (data) and a script (download_data.sh) are provided to download and store the required files. Running this script from the data directory will download and unpack all required files. The location these files are downloaded to matches the expected location in the Nextflow config, so you can run the workflow immediately after downloading.

The ClinVar Variant and Submission summary files are updated weekly. You should delete your local copy and re-download each time you run this workflow, to ensure you're capturing the latest data.

Running the workflow

The ClinvArbitration workflow can be run containerised, or locally. By default, the reference data will be read from a directory called data, and the outputs written to a directory nextflow_outputs.

Local execution requires:

a Nextflow installation, to operate the workflow
a Python environment, with the ClinvArbitration package and its dependencies installed
- this can be actioned with pip install . from the root of this repository
BCFtools, to annotate the ClinVar variants with gene information

nextflow -c nextflow/nextflow.config \
    run nextflow/clinvarbitration.nf

A containerised execution requires:

a Nextflow installation, to operate the workflow
a Docker installation, to run the workflow in a container

Step 1: build the Docker image:

docker build -t clinvarbitration:local .

Step 2: run the workflow using the Docker image:`

nextflow -c nextflow/nextflow.config \
    run nextflow/clinvarbitration.nf \
    -with-docker clinvarbitration:local

CPG-Flow

Internally at CPG, this workflow is run using CPG-Flow, an in-house Hail Batch based workflow executor. The following elements relate to that workflow:

an example config file, with enough entries populated that a standard CPG user could dry-run the workflow locally
a workflow runner script
a definition of all workflow stages

The intention is that once the Dockerfile within this repository is used, this workflow can be triggered like so:

analysis-runner \
    --skip-repo-checkout \
    --image australia-southeast1-docker.pkg.dev/cpg-common/images-dev/clinvarbitration:PR_24 \
    --config new_clinvarbitration.toml \
    --dataset seqr \
    --description 'resummarise_clinvar' \
    -o resummarise_clinvar \
    --access-level test \
    run_workflow

A config file is required containing a few entries, some relating to this workflow specifically, some relating to cpg-flow setup:

workflow.driver_image: populated by analysis-runner, points to this docker image
site_blacklist: list of ClinVar submitters to ignore. Useful in removing noise, or blinding to self submissions
ref_fasta: required to run bcftools csq. Must match the genome_build
genome_build: used to decide whether ClinVar/Annotation is sourced using GRCh37 or GRCh38 (default)

Acknowledgements

ClinVar, for providing the data which this process is based on

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
data		data
docs		docs
nextflow		nextflow
src/clinvarbitration		src/clinvarbitration
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pull_request_template.md		pull_request_template.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ClinVar, re-summarised

Motivation

Content

Primary Outputs

TSVs

Usage

Download Results

Local Running

Downloading input files

Running the workflow

CPG-Flow

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

HudsonAlpha/ClinvArbitration

Folders and files

Latest commit

History

Repository files navigation

ClinVar, re-summarised

Motivation

Content

Primary Outputs

TSVs

Usage

Download Results

Local Running

Downloading input files

Running the workflow

CPG-Flow

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages