varseek is a free, open-source command-line tool and Python package that provides variant screening of DNA-seq, bulk RNA-seq, and single-cell RNA-seq data using k-mer-based alignment against a reference of known variants.
The two commands used in a standard workflow are varseek ref and varseek count. varseek ref takes as input (1) a database of variants (e.g., COSMIC, ClinVar, dbSNP, custom) and (2) the reference genome/transcriptome upon which the variants are annotated. varseek ref outputs a variant-containing reference sequence (VCRS) index that serves as the basis for variant calling in varseek count. varseek count takes as input (1) the VCRS index generated by varseek ref and (2) sequencing read data in FASTQ format. varseek count outputs a variant count matrix in Anndata format with samples/cells (rows) x variants (columns).
varseek utilizes the pseudoalignment algorithm implement by the kb-python package. varseek ref creates the VCRS index by taking short sequences flanking each variant, in which each k-mer of the VCRS contains the variant nucleotide(s). varseek ref wraps varseek build, varseek info, varseek filter, and kb ref to create the VCRS index. varseek count uses the VCRS index to pseudoalign sequencing reads and count the number of reads that map to each variant. The variant count matrix can be used for downstream analysis, such as clustering, differential expression, and pathway analysis. varseek count wraps varseek fastqpp, kb count, varseek clean, and varseek summarize to generate the variant count matrix.
The functions of varseek are described in the table below.
| Description | Bash | Python (with import varseek as vk) |
|---|---|---|
| Build a variant-containing reference sequence (VCRS) fasta file | vk build ... |
vk.build(...) |
| Describe the VCRS reference in a dataframe for filtering | vk info ... |
vk.info(...) |
| Filter the VCRS file based on the CSV generated from varseek info | vk filter ... |
vk.filter(...) |
| Preprocess the FASTQ files before pseudoalignment | vk fastqpp ... |
vk.fastqpp(...) |
| Process the variant count matrix | vk clean ... |
vk.clean(...) |
| Analyze the variant count matrix results | vk summarize ... |
vk.summarize(...) |
| Wrap vk build, vk info, vk filter, and kb ref | vk ref ... |
vk.ref(...) |
| Wrap vk fastqpp, kb count, vk clean, and vk summarize | vk count ... |
vk.count(...) |
| Create synthetic RNA-seq dataset with variant-containing reads | vk sim ... |
vk.sim(...) |
From PyPI:
pip install varseekFrom GitHub:
pip install git+https://github.com/pachterlab/varseek.gitFollow one of the below options:
- (optional) View all downloadable references:
vk ref --list_downloadable_references vk ref --download --variants VARIANTS --sequences SEQUENCES
vk ref --variants VARIANTS --sequences SEQUENCES ...
c. Customize each step of the reference building process - e.g., add additional information by which to filter, add custom filtering logic, tune filtering parameters based on the results of intermediate steps, etc.
vk build --variants VARIANTS --sequences SEQUENCES ...- (optional)
vk info --input_dir INPUT_DIR ... - (optional)
vk filter --input_dir INPUT_DIR ... kb ref --workflow custom --index INDEX ...
Follow one of the below options:
vk count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2 ...
b. Customize variant screening process - additional fastq preprocessing, custom count matrix processing
- (optional)
vk fastqpp ... --fastqs FASTQ1 FASTQ2 ... kb count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2 ...- (optional)
kb count --index REFERENCE_INDEX --t2g REFERENCE_T2G ... --fastqs FASTQ1 FASTQ2 ... - (optional)
vk clean --adata ADATA ... - (optional)
vk summarize --adata ADATA ...
See help for each function/command for more details:
Python arguments are equivalent to command line arguments (--arg), unless otherwise specified. Flags are True/False arguments in Python. Default true arguments in Python have the prefix 'disable_' on command line (e.g., mm defaults to True in Python, so mm=False in Python is equivalent to --disable_mm on command line). The manual for any varseek function/command can be called from Python with the help() function or from the command-line using the -h/--help flag.
import varseek as vk
help(vk.ref)
help(vk.count)
...vk ref -h
vk count -h
...Examples for getting started: GitHub - pachterlab/varseek-examples
Repository for manuscript figures: GitHub - pachterlab/RLSRP_2025
If you use varseek in a publication, please cite the following study:
PAPER CITATION
Read the article here: https://doi.org/10.1101/2025.09.03.674039


