GitHub Repository for the hands-on snakemake learn session at the MannLabs Group Retreat 2025
Snakemake is a python-based workflow manager that is supposed to make your life easier when analysing large datasets. It enforces reproducibility and enables scalability.
In this tutorial, we will
- read in a dataset (here: a small image)
- process it with a simple function (here: apply different image transformations to it)
- generate a plot as output (here: histograms of pixel intensities)
- generate a snakemake report.
-
Using the command line, go into your favorite directory (
cd /path/to/my/favorite/directory
) -
Clone this repository
git clone https://github.com/lucas-diedrich/snakemake-learning.git
(or download it via Code > Download ZIP
, and unzip it locally)
- Go into the directory
cd snakemake-learning
- Create a
mamba
/conda
environment with snakemake based on theenvironment.yaml
file and activate it
mamba create -n snakemake-env --file environment.yaml && mamba activate snakemake-env
# OR conda env create -f environment.yaml && conda activate snakemake-env
- Check if the installation was successful
snakemake --version
> 9.5.1
See the slides in ./docs
Run the following command in the root directory (.
) to se the whole task graph.
# --dag: Directed acyclic graph
snakemake --dag
And the following command to inspect how the rules depend on one another (simpler than task graph, especially for large workflows)
# --rulegraph: Show dependencies between rules
snakemake --rulegraph
---
title: Rule Graph
---
flowchart TB
id0[all]
id1[plot_histogram]
id2[transform_image]
id3[save_image]
style id0 fill:#CD5C5C,stroke-width:2px,color:#333333
style id1 fill:#F08080,stroke-width:2px,color:#333333
style id2 fill:#FA8072,stroke-width:2px,color:#333333
style id3 fill:#E9967A,stroke-width:2px,color:#333333
id0 --> id0
id1 --> id0
id2 --> id1
id3 --> id2
You can use this grapviz visualizer
editor to view the task graph
Go in the ./workflow
directory and run:
snakemake --cores 2 --use-conda
The output can be found in the ./results
directory
Go in the ./workflow
directory and run
snakemake --report ../results/report.html
The output can be found in the ./results
directory
You can run this workflow on an high-performance computing cluster (here leveraging the slurm manager). In this case, one slurm job acts as a scheduler that submits individual rule executions as separate slurm jobs. The snakemake-executor-plugin-slurm
automatically handles the scheduling and submission of dependent jobs. Please checkout the script /workflow/snakemake.sbatch
and the official snakemake slurm plugin documentation to learn more about the relevant flags and settings.
Install the environment
conda create -n snakemake-env -y
conda env update --n snakemake-env --file environment.yaml
Additionally install the snakemake-executor-plugin-slurm
:
pip install snakemake-executor-plugin-slurm
Then submit the provided workflow script on a cluster
cd /workflow/
sbatch snakemake.sbatch
To further deepen your understanding after the workshop.
The script create-data.py
can take image names (that are part of the skimage
package) as arguments.
python scripts/create-data.py --image-name <image name> --output <output name>
Modify the workflow in a way that it also (=in addition) runs on other skimage
example datasets, e.g. colorwheel, cat, logo
Add a new rule in which you generate an aggregated plot - where the image and its modifications are shown in the top row and the associated histograms are shown in the bottom row.
Explore possibilities to modify the report with the rich structured text format.
-
Snakemake homepage + Documentation snakemake.readthedocs.io
-
Publication Mölder F, Jablonski KP, Letcher B et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)