Skip to content

lucas-diedrich/snakemake-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snakemake-learning

GitHub Repository for the hands-on snakemake learn session at the MannLabs Group Retreat 2025

Snakemake is a python-based workflow manager that is supposed to make your life easier when analysing large datasets. It enforces reproducibility and enables scalability.

Tutorial overview

In this tutorial, we will

  1. read in a dataset (here: a small image)
  2. process it with a simple function (here: apply different image transformations to it)
  3. generate a plot as output (here: histograms of pixel intensities)
  4. generate a snakemake report.

Results

Installation

  1. Using the command line, go into your favorite directory (cd /path/to/my/favorite/directory)

  2. Clone this repository

git clone https://github.com/lucas-diedrich/snakemake-learning.git

(or download it via Code > Download ZIP, and unzip it locally)

  1. Go into the directory
cd snakemake-learning
  1. Create a mamba/conda environment with snakemake based on the environment.yaml file and activate it
mamba create -n snakemake-env --file environment.yaml && mamba activate snakemake-env

# OR conda env create -f environment.yaml && conda activate snakemake-env
  1. Check if the installation was successful
snakemake --version
> 9.5.1

Tutorial

1. Snakemake - Introduction

See the slides in ./docs

2. Check out the workflow

Run the following command in the root directory (.) to se the whole task graph.

# --dag: Directed acyclic graph
snakemake --dag 

And the following command to inspect how the rules depend on one another (simpler than task graph, especially for large workflows)

# --rulegraph: Show dependencies between rules
snakemake --rulegraph
---
title: Rule Graph
---
flowchart TB
        id0[all]
        id1[plot_histogram]
        id2[transform_image]
        id3[save_image]
        style id0 fill:#CD5C5C,stroke-width:2px,color:#333333
        style id1 fill:#F08080,stroke-width:2px,color:#333333
        style id2 fill:#FA8072,stroke-width:2px,color:#333333
        style id3 fill:#E9967A,stroke-width:2px,color:#333333
        id0 --> id0
        id1 --> id0
        id2 --> id1
        id3 --> id2
Loading

You can use this grapviz visualizer editor to view the task graph

3. Run the full workflow

Go in the ./workflow directory and run:

snakemake --cores 2 --use-conda

The output can be found in the ./results directory

Generate the report

Go in the ./workflow directory and run

snakemake --report ../results/report.html

The output can be found in the ./results directory

Run on a slurm HPC cluster

You can run this workflow on an high-performance computing cluster (here leveraging the slurm manager). In this case, one slurm job acts as a scheduler that submits individual rule executions as separate slurm jobs. The snakemake-executor-plugin-slurm automatically handles the scheduling and submission of dependent jobs. Please checkout the script /workflow/snakemake.sbatch and the official snakemake slurm plugin documentation to learn more about the relevant flags and settings.

Execution

Install the environment

conda create -n snakemake-env -y
conda env update --n snakemake-env --file environment.yaml

Additionally install the snakemake-executor-plugin-slurm:

pip install snakemake-executor-plugin-slurm

Then submit the provided workflow script on a cluster

cd /workflow/
sbatch snakemake.sbatch

Exercises

To further deepen your understanding after the workshop.

1. Scale the workflow to other images

The script create-data.py can take image names (that are part of the skimage package) as arguments.

python scripts/create-data.py --image-name <image name> --output <output name>

Modify the workflow in a way that it also (=in addition) runs on other skimage example datasets, e.g. colorwheel, cat, logo

2. Add a rule

Add a new rule in which you generate an aggregated plot - where the image and its modifications are shown in the top row and the associated histograms are shown in the bottom row.

3. Prettify the report

Explore possibilities to modify the report with the rich structured text format.

References

About

GitHub Repository for the snakemake learn session at the @MannLabs Group Retreat 2025

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published