Skip to content

pachterlab/varseek

varseek

pypi version Downloads license status Code Coverage

alt text

varseek is a free, open-source command-line tool and Python package that provides variant screening of DNA-seq, bulk RNA-seq, and single-cell RNA-seq data using k-mer-based alignment against a reference of known variants.

alt text

The two commands used in a standard workflow are varseek ref and varseek count. varseek ref takes as input (1) a database of variants (e.g., COSMIC, ClinVar, dbSNP, custom) and (2) the reference genome/transcriptome upon which the variants are annotated. varseek ref outputs a variant-containing reference sequence (VCRS) index that serves as the basis for variant calling in varseek count. varseek count takes as input (1) the VCRS index generated by varseek ref and (2) sequencing read data in FASTQ format. varseek count outputs a variant count matrix in Anndata format with samples/cells (rows) x variants (columns).

varseek utilizes the pseudoalignment algorithm implement by the kb-python package. varseek ref creates the VCRS index by taking short sequences flanking each variant, in which each k-mer of the VCRS contains the variant nucleotide(s). varseek ref wraps varseek build, varseek info, varseek filter, and kb ref to create the VCRS index. varseek count uses the VCRS index to pseudoalign sequencing reads and count the number of reads that map to each variant. The variant count matrix can be used for downstream analysis, such as clustering, differential expression, and pathway analysis. varseek count wraps varseek fastqpp, kb count, varseek clean, and varseek summarize to generate the variant count matrix.

alt text

The functions of varseek are described in the table below.

Description Bash Python (with import varseek as vk)
Build a variant-containing reference sequence (VCRS) fasta file vk build ... vk.build(...)
Describe the VCRS reference in a dataframe for filtering vk info ... vk.info(...)
Filter the VCRS file based on the CSV generated from varseek info vk filter ... vk.filter(...)
Preprocess the FASTQ files before pseudoalignment vk fastqpp ... vk.fastqpp(...)
Process the variant count matrix vk clean ... vk.clean(...)
Analyze the variant count matrix results vk summarize ... vk.summarize(...)
Wrap vk build, vk info, vk filter, and kb ref vk ref ... vk.ref(...)
Wrap vk fastqpp, kb count, vk clean, and vk summarize vk count ... vk.count(...)
Create synthetic RNA-seq dataset with variant-containing reads vk sim ... vk.sim(...)

Installation

From PyPI:

pip install varseek

From GitHub:

pip install git+https://github.com/pachterlab/varseek.git

🪄 Quick start guide

1. Acquire a Reference

Follow one of the below options:

a. Download a pre-built reference

  • (optional) View all downloadable references: vk ref --list_downloadable_references
  • vk ref --download --variants VARIANTS --sequences SEQUENCES

b. Make custom reference

  • vk ref --variants VARIANTS --sequences SEQUENCES ...

c. Customize each step of the reference building process - e.g., add additional information by which to filter, add custom filtering logic, tune filtering parameters based on the results of intermediate steps, etc.

  • vk build --variants VARIANTS --sequences SEQUENCES ...
  • (optional) vk info --input_dir INPUT_DIR ...
  • (optional) vk filter --input_dir INPUT_DIR ...
  • kb ref --workflow custom --index INDEX ...

2. Screen for variants

Follow one of the below options:

a. Standard workflow

  • vk count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2 ...

b. Customize variant screening process - additional fastq preprocessing, custom count matrix processing

  • (optional) vk fastqpp ... --fastqs FASTQ1 FASTQ2 ...
  • kb count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2 ...
  • (optional) kb count --index REFERENCE_INDEX --t2g REFERENCE_T2G ... --fastqs FASTQ1 FASTQ2 ...
  • (optional) vk clean --adata ADATA ...
  • (optional) vk summarize --adata ADATA ...

Help

See help for each function/command for more details:

Python arguments are equivalent to command line arguments (--arg), unless otherwise specified. Flags are True/False arguments in Python. Default true arguments in Python have the prefix 'disable_' on command line (e.g., mm defaults to True in Python, so mm=False in Python is equivalent to --disable_mm on command line). The manual for any varseek function/command can be called from Python with the help() function or from the command-line using the -h/--help flag.

import varseek as vk
help(vk.ref)
help(vk.count)
...
vk ref -h
vk count -h
...

Additional examples and citation

Examples for getting started: GitHub - pachterlab/varseek-examples

Repository for manuscript figures: GitHub - pachterlab/RLSRP_2025

If you use varseek in a publication, please cite the following study:

PAPER CITATION

Read the article here: https://doi.org/10.1101/2025.09.03.674039

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •