snakemake-learning

GitHub Repository for the hands-on snakemake learn session at the MannLabs Group Retreat 2025

Snakemake is a python-based workflow manager that is supposed to make your life easier when analysing large datasets. It enforces reproducibility and enables scalability.

Tutorial overview

In this tutorial, we will

read in a dataset (here: a small image)
process it with a simple function (here: apply different image transformations to it)
generate a plot as output (here: histograms of pixel intensities)
generate a snakemake report.

Installation

Using the command line, go into your favorite directory (cd /path/to/my/favorite/directory)
Clone this repository

git clone https://github.com/lucas-diedrich/snakemake-learning.git

(or download it via Code > Download ZIP, and unzip it locally)

Go into the directory

cd snakemake-learning

Create a mamba/conda environment with snakemake based on the environment.yaml file and activate it

mamba create -n snakemake-env --file environment.yaml && mamba activate snakemake-env

# OR conda env create -f environment.yaml && conda activate snakemake-env

Check if the installation was successful

snakemake --version
> 9.5.1

Tutorial

1. Snakemake - Introduction

See the slides in ./docs

2. Check out the workflow

Run the following command in the root directory (.) to se the whole task graph.

# --dag: Directed acyclic graph
snakemake --dag

And the following command to inspect how the rules depend on one another (simpler than task graph, especially for large workflows)

# --rulegraph: Show dependencies between rules
snakemake --rulegraph

---
title: Rule Graph
---
flowchart TB
        id0[all]
        id1[plot_histogram]
        id2[transform_image]
        id3[save_image]
        style id0 fill:#CD5C5C,stroke-width:2px,color:#333333
        style id1 fill:#F08080,stroke-width:2px,color:#333333
        style id2 fill:#FA8072,stroke-width:2px,color:#333333
        style id3 fill:#E9967A,stroke-width:2px,color:#333333
        id0 --> id0
        id1 --> id0
        id2 --> id1
        id3 --> id2

You can use this grapviz visualizer editor to view the task graph

3. Run the full workflow

Go in the ./workflow directory and run:

snakemake --cores 2 --use-conda

The output can be found in the ./results directory

Generate the report

Go in the ./workflow directory and run

snakemake --report ../results/report.html

The output can be found in the ./results directory

Run on a slurm HPC cluster

You can run this workflow on an high-performance computing cluster (here leveraging the slurm manager). In this case, one slurm job acts as a scheduler that submits individual rule executions as separate slurm jobs. The snakemake-executor-plugin-slurm automatically handles the scheduling and submission of dependent jobs. Please checkout the script /workflow/snakemake.sbatch and the official snakemake slurm plugin documentation to learn more about the relevant flags and settings.

Execution

Install the environment

conda create -n snakemake-env -y
conda env update --n snakemake-env --file environment.yaml

Additionally install the snakemake-executor-plugin-slurm:

pip install snakemake-executor-plugin-slurm

Then submit the provided workflow script on a cluster

cd /workflow/
sbatch snakemake.sbatch

Exercises

To further deepen your understanding after the workshop.

1. Scale the workflow to other images

The script create-data.py can take image names (that are part of the skimage package) as arguments.

python scripts/create-data.py --image-name <image name> --output <output name>

Modify the workflow in a way that it also (=in addition) runs on other skimage example datasets, e.g. colorwheel, cat, logo

2. Add a rule

Add a new rule in which you generate an aggregated plot - where the image and its modifications are shown in the top row and the associated histograms are shown in the bottom row.

3. Prettify the report

Explore possibilities to modify the report with the rich structured text format.

References

Snakemake homepage + Documentation snakemake.readthedocs.io
Publication Mölder F, Jablonski KP, Letcher B et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
docs		docs
workflow		workflow
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

snakemake-learning

Tutorial overview

Installation

Tutorial

1. Snakemake - Introduction

2. Check out the workflow

3. Run the full workflow

Generate the report

Run on a slurm HPC cluster

Execution

Exercises

1. Scale the workflow to other images

2. Add a rule

3. Prettify the report

References

About

Uh oh!

Releases 2

Packages

Languages

License

lucas-diedrich/snakemake-learning

Folders and files

Latest commit

History

Repository files navigation

snakemake-learning

Tutorial overview

Installation

Tutorial

1. Snakemake - Introduction

2. Check out the workflow

3. Run the full workflow

Generate the report

Run on a slurm HPC cluster

Execution

Exercises

1. Scale the workflow to other images

2. Add a rule

3. Prettify the report

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages