A simple nextflow pipeline for calling peaks and producing tracks from ChIP-seq data.
Given a single ChIP and a single Input BAM files, peakflow
estimates fragment size and calls peaks with MACS2 and produces coverage tracks for ChIP, Input, and ChIP/Input ratio with deepTools. Coverage tracks are binned according to a specified bin size and have reads extended to an estimated fragment size. BAM files are treated as single-end ones by default, but paired-end processing can be enabled.
nextflow pull dmitrymyl/peakflow
The pipeline requires two kinds of inputs: read alignment files (BAM/BAI) and parameters. Paths to read alignment files are specified in samplesheet.csv
.
peakflow
requires indexed BAM files for ChIP and Input samples with BAI index files in the same directory. samplesheet.csv
specifies paths only to BAM files in the following format:
type,path
chip,/path/to/chip.bam
input,/path/to/input.bam
samplesheet.csv
is a comma-delimited CSV file with a header and two data rows. The header consists of two columns: type and path. Type can be either "chip" or "input". Rows specify paths to BAM files with the corresponding types.
The pipeline has the following parameters:
Name | Type | Default value | Description |
---|---|---|---|
samplesheet | file path (csv) | "./samplesheet.csv" |
Path to the samplesheet |
blacklist | file path (bed) | "./assets/hg19-blacklist.v2.bed" |
Path to the blacklist file |
outdir | file path | null |
Path to the output directory |
prefix | string | "sample" |
Filename prefix to start output files with |
binsize | integer | 1000 |
Size of bins for coverage tracks |
callpeaks | boolean | true |
Whether to call peaks or not |
extreads | boolean | true |
Whether to extend reads for coverage tracks or not |
effgsize | integer | 2736124898 |
Effective genome size (default is one for hg19) |
pairedend | boolean | false |
If the sample is paired-end or single-end |
Parameters are supplied as a json file. An example can be found at params.json
.
nextflow run dmitrymyl/peakflow -profile PROFILE -params-file params.json
Available PROFILE
values are local
and those available form nf-core.
Activate nextflow:
ml build-env/f2022
module --ignore-cache load "nextflow/23.10.1"
export NXF_OPTS='-Xms1g -Xmx4g'
export NXF_TEMP="/scratch/nextflow/"$USER"/nxf_temp"
export NXF_WORK="/scratch/nextflow/"$USER"/nxf_work"
export NXF_ANSI_LOG=false
Run the pipeline with:
nextflow -bg run dmitrymyl/peakflow -r main -profile cbe -params-file params.json
Profile cbe
makes the pipeline use slurm for job submission and resource management.
Script file run_cluster.sh
contains all the commands above, runs the pipeline with nohup (detached from the shell), and sends an email upon pipeline termination. You can download this script, modify it accordingly and run it yourself.
In command line, nextflow options are specified with the single hyphen (such as -profile
or -params
), while the workflow parameters are specified with the double hyphen (such as --samplesheet
and all the rest available in params.json
). For example, instead of supplying params.json
you can specify necessary workflow parameters directly on the command line:
nextflow -bg run dmitrymyl/peakflow -r main -profile cbe --samplesheet samplesheet.csv --blacklist blacklist.bed --outdir results --prefix sample
-bg
option allows to run nextflow in the background. -resume
option allows to skip successful steps in case of a rerun.
The output directory has the following files:
prefix.chip_track.bw
ChIP CPM coverage track.prefix.input_track.bw
Input CPM coverage track.prefix.ratio_track.bw
log2 ChIP/Input CPM ratio track.prefix.peaks.narrowPeak
MACS2 peaks.prefix.peaks_noblacklist.narrowPeak
MACS2 peaks filtered by blacklist.prefix.model.pdf
peak model plot produced by MACS2.prefix.fragment_size.txt
estimated fragment size as a single number.
All bigwig tracks are binned and have reads extended to the fragment size.
There are two ways to manage dependencies in this pipeline: a conda environment and a container. The container is the recommended option to run the pipeline.
The Apptainer/Singularity container is available at oras://docker.io/gerlichlab/peakflow-apptainer:latest
. See Apptainer
or Dockerfile
definition file to build a container yourself. When running, nextflow automatically downloads the container.
conda environment specification file is conda.yml
. When running, nextflow creates a conda environment based on this specification
Two step process: login and push.
apptainer remote login --username dh-user oras://docker.io
apptainer push container.sif oras://docker.io/dh-user/container:tag