peakScout

peakScout is a user-friendly and reversible peak-to-gene translator for genomic peak calling results

If you use peakScout, please cite https://doi.org/10.1101/2025.09.07.671934 - thank you!

Overview

PeakScout is a bioinformatics tool designed to bridge the gap between genomic peak data and gene annotations, enabling researchers to understand the relationship between regulatory elements and their target genes. At its core, peakScout processes genomic peak files generated by popular peak callers like MACS2 and SEACR and maps them to nearby genes using reference genome annotations.

peakScount performs:

Peak-to-Gene Mapping: This function identifies the nearest genes to each peak, allowing researchers to infer which genes might be regulated by specific genomic regions. Users can specify how many nearest genes (k) they want to retrieve for each peak.
Gene-to-Peak Mapping: Conversely, this function finds the nearest peaks to a list of genes, helping researchers identify potential regulatory elements that may influence gene expression.

peakScout expects two inputs:

A peak file (in BED6 format or as output from MACS2 or SEACR)
A reference GTF file containing gene annotations. The tool can decompose the GTF file into chromosome-specific collections of genomic features, which are then used to perform bidirectional mapping between peaks and genes.

peakScount can be run via:

Command line: peakScout is designed to be run from the command line, making it accessible for users comfortable with terminal operations.
Cloud computing: for instanct access web access, we have set up peakScout in the cloud - https://vandydata.github.io/peakScout.

Installation

From source

These instructions should generally work without modification in linux-based environments. If you are using Windows, we strongly recommend you use WSL2 to have a Linux environment within Windows.

# 1. Clone the Repository
git clone https://github.com/vandydata/peakScout.git

# 2. Make the Script Executable
cd peakScout; chmod +x src/peakScout

# 3. Add to Path
# Alternatively, edit your `~/.bashrc` to make this change permanent, but be sure 
# to include the complete path in the file itself, not the `$(pwd)`. 
export PATH="$PATH:$(pwd)/src"

# 4. Set up virtual environment and install dependencies
# with venv
python3 -m venv peakscout
source peakscout/bin/activate
pip3 install -r requirements.txt

# OR with uv
uv venv peakscout
source peakscout/bin/activate
uv pip install -r requirements.txt

Docker & singularity containers

We have made available a Docker image for peakScout, which can be run as follows:

docker run -it --rm jpcartailler/peakscout:latest peakScout --help

For singularity, you can convert the Docker image to a Singularity image and run it as follows:

singularity pull docker://jpcartailler/peakscout:latest
singularity exec peakscout_latest.sif peakScout --help

Usage

Decomposing Reference GTF

The first step of peakScout is to create the decomposed reference.

Parameter	Type	Description
`ref_dir`	`str`	The directory to store the GTF decompositions.
`gtf_ref`	`str`	The path to the GTF file.

To decompose a reference GTF file so that it can be used by peakScout, run the following command

peakScout decompose \
--ref_dir /path/to/where/outputs/stored \
--gtf_ref /path/to/gtf/file

Specific example:

peakScout decompose \
--ref_dir reference/mm39/ \
--gtf_ref reference/gencode.vM37.basic.annotation.gtf

A directory called reference/mm39 will be created and should be used as the ref_dir argument for downstream peakScout operations.

Finding Nearest Genes

Once a reference GTF has been decomposed, you can use the decomposition to find the nearest genes to your peaks. Peak files can be MACS2, SEACR outputs, or standard BED6 format files and can be Excel sheets or BED files.

Parameter	Type	Description
`peak_file`	`str`	Path to the peak file.
`peak_type`	`str`	Type of peak caller used to generate peak file (e.g. MACS2, SEACR, BED6).
`num_features`	`int`	Number of nearest features to find.
`ref_dir`	`str`	Directory containing decomposed reference data.
`output_name`	`str`	Name for output file.
`out_dir`	`str`	Directory to output file.
`output_type`	`str`	Output type (csv file or xlsx file).
`species_genome`	`str`	Species of the reference genome.
`option`	`str`	Option for defining start and end positions of peaks. Default native_peak_bounaries.
`boundary`	`int`	Boundary for artificial peak boundary option. `None` if other options.
`up_bound`	`int`	Maximum allowed distance between peak and upstream feature. Default `None`.
`down_bound`	`int`	Maximum allowed distance between peak and downstream feature. Default `None`.
`consensus`	`bool`	Whether to use consensus peaks. Default `False`.
`drop_columns`	`bool`	Whether to drop unnecessary columns from the original file. Default `False`.
`view_window`	`float`	Proportion of the peak region in entire genome browser window. Default `0.2`.

Run the following command to create an Excel sheet containing the nearest k genes to your peaks

peakScout peak2gene \
--peak_file /path/to/peak/file \
--peak_type MACS2/SEACR/BED6 \
--species_genome UCSC-defined species of gtf \
--k number of nearest genes \
--ref_dir /path/to/reference/directory \
--output_name name of output file \
--o /path/to/save/output \
--output_type csv/xlsx

Specific example:

peakScout peak2gene \
--peak_file test/test_MACS2.bed \
--peak_type MACS2 \
--species_genome mm39 \
--k 2 \
--ref_dir reference/mm39 \
--output_name peakScout_test_MACS2 \
--o my_output_dir \
--output_type xlsx

Finding Nearest Peaks

Once a reference GTF has been decomposed, you can also use the decomposition to find the nearest peaks to a set of genes. Peak files can be MACS2, SEACR outputs, or standard BED6 format files and can be Excel sheets or BED files. Gene names should be in a single column CSV or txt file with no header.

Parameter	Type	Description
`peak_file`	`str`	Path to the peak file.
`peak_type`	`str`	Type of peak caller used to generate peak file (e.g. MACS2, SEACR, BED6).
`gene_file`	`str`	Path to the gene file.
`num_features`	`int`	Number of nearest features to find.
`ref_dir`	`str`	Directory containing decomposed reference data.
`output_name`	`str`	Name for output file.
`out_dir`	`str`	Directory to output file.
`output_type`	`str`	Output type (csv file or xlsx file).
`option`	`str`	Option for defining start and end positions of peaks. Default native_peak_boundaries.
`boundary`	`int`	Boundary for artificial peak boundary option. `None` if other options.
`consensus`	`bool`	Whether to use consensus peaks. Default `False`.

Run the following command to create an Excel sheet containing the nearest k peaks to your genes

peakScout gene2peak \
--peak_file /path/to/peak/file \
--peak_type MACS2/SEACR/BED6 \
--gene_file /path/to/gene/file \
--k number of nearest peaks \
--ref_dir /path/to/reference/directory \
--output_name name of output file \
--o /path/to/save/output \
--output_type csv/xlsx

Specific example:

peakScout gene2peak \
--peak_file test/test_MACS2.bed \
--peak_type MACS2 \
--gene_file test/test_genes.txt \
--k 3 \
--ref_dir reference/mm39 \
--output_name test_gene2peak_MACS2 \
--o my_output_dir \
--output_type csv

peakScout ready-made references for common organisms

For your convenience, we have prepared reference files for common organisms, generated by src/utils/decompose-common-organisms.sh. Source files are the GTFs below and downloadable peakScout reference files are the S3 links.

Species	GTF	S3
arabidopsis_TAIR10	GTF	S3
fly_BDGP6.54	GTF	S3
frog_v10.1	GTF	S3
human_hg19	GTF	S3
human_hg38	GTF	S3
mouse_mm10	GTF	S3
mouse_mm39	GTF	S3
pig_Sscrofa11.1	GTF	S3
worm_WBcel235	GTF	S3
yeast_R64-1-1	GTF	S3
zebrafish_GRCz11	GTF	S3

FAQ

What is WSL2 and how do I install it?

Start at https://learn.microsoft.com/en-us/windows/wsl/install and then Google for your issues. We cannot provide support for this.

How to get a GTF file?

Ensembl and Gencode have them. For example, go to https://www.gencodegenes.org/mouse/release_M37.html and select the GTF you'd like to decompose, then:

mkdir reference
cd reference
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M37/gencode.vM37.basic.annotation.gtf.gz
gunzip gencode.vM37.basic.annotation.gtf.gz

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.github/workflows		.github/workflows
assets		assets
aws		aws
docs		docs
results		results
src		src
test		test
www		www
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

peakScout

Overview

Installation

From source

Docker & singularity containers

Usage

Decomposing Reference GTF

Finding Nearest Genes

Finding Nearest Peaks

peakScout ready-made references for common organisms

FAQ

What is WSL2 and how do I install it?

How to get a GTF file?

About

Uh oh!

Releases

Uh oh!

Contributors 3

Uh oh!

Languages

License

vandydata/peakScout

Folders and files

Latest commit

History

Repository files navigation

peakScout

Overview

Installation

From source

Docker & singularity containers

Usage

Decomposing Reference GTF

Finding Nearest Genes

Finding Nearest Peaks

peakScout ready-made references for common organisms

FAQ

What is WSL2 and how do I install it?

How to get a GTF file?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 3

Uh oh!

Languages