ing-bank · sbrugman · Oct 23, 2025 · Nov 14, 2025 · Nov 14, 2025 · Nov 14, 2025
diff --git a/docs/.nav.yml b/docs/.nav.yml
@@ -22,7 +22,8 @@ nav:
       - Hooks:
           - Benchmark runtimes: guides/examples/benchmark_node_runtime.md
       - Adopting Ordeq:
-          - Coming from Kedro: guides/kedro.md
+          - Coming from Kedro: guides/adopting/kedro.md
+          - Coming from Dagster: guides/adopting/dagster.md
       - Integrations:
           - Docker: guides/integrations/docker.md
           - Marimo: guides/integrations/marimo.md

diff --git a/docs/guides/adopting/dagster.md b/docs/guides/adopting/dagster.md
@@ -0,0 +1,164 @@
+# Coming from Dagster
+
+This guide is for users familiar with Dagster who want to get started with Ordeq.
+
+- Dagster is also a data orchestrator, while Ordeq focuses on streamlining data engineering tasks with a simpler and more flexible approach.
+    This means that Dagster has additional built-in features, such as schedules, run tracking and sensors, that are not present in Ordeq.
+    Instead, Ordeq integrates seamlessly with existing scheduling and monitoring tools, such as Dagster, Airflow and KubeFlow, allowing you to leverage your current infrastructure.
+    Writing your code in Ordeq is often more straightforward and requires less boilerplate compared to concept-heavy Dagster.
+    This reduces the learning curve and accelerates development time.
+    Because of the above distinction, Dagster also has more dependencies (49 at the time of writing) compared to the lightweight Ordeq (none).
+
+- Ordeq tries to leverage native Python features as much as possible, while Dagster often requires using its own abstractions.
+    This means that in Ordeq you can use standard Python libraries and tools without needing to adapt them to Dagster's framework.
+    This results in more readable and maintainable code, as you can rely on familiar Python constructs.
+
+## Assets vs IOs
+
+- Dagster: function-based assets, four decorators https://docs.dagster.io/guides/build/assets/defining-assets
+- Ordeq: single `@node` decorator with IOs. IOs are classes (with attributes for state).
+
+## Example project
+
+For this guide we use the "project_ml" example project from the Dagster repository, and show how the same functionality can be implemented in Ordeq.
+The original Dagster example can be found [here](https://github.com/dagster-io/dagster/tree/master/examples/docs_projects/project_ml).
+
+```text title="Dagster project structure"
+.
+├── pyproject.toml
+├── src
+│   └── project_ml
+│       ├── __init__.py
+│       ├── definitions.py
+│       └── defs
+│           ├── __init__.py
+│           ├── asset_checks.py
+│           ├── assets
+│           │   ├── __init__.py
+│           │   ├── data_assets.py
+│           │   ├── model_assets.py
+│           │   └── prediction_assets.py
+│           ├── constants.py
+│           ├── jobs.py
+│           ├── resources.py
+│           ├── schedules.py
+│           ├── sensors.py
+│           ├── types.py
+│           └── utils.py
+└── tests
+    ├── __init__.py
+    ├── conftest.py
+    ├── test_data_assets.py
+    ├── test_full_pipeline.py
+    └── test_model.py
+```
+
+```text title="Ordeq project structure"
+.
+├── data
+│   └── 01_raw
+├── pyproject.toml
+├── src
+│   └── project_ml
+│       ├── __init__.py
+│       ├── __main__.py
+│       ├── catalog.py
+│       ├── config
+│       │   ├── __init__.py
+│       │   ├── batch_prediction_config.py
+│       │   ├── deployment_config.py
+│       │   ├── model_config.py
+│       │   ├── model_evaluation_config.py
+│       │   └── real_time_prediction_config.py
+│       ├── data
+│       │   ├── __init__.py
+│       │   ├── data_preprocessing.py
+│       │   └── raw_data_loading.py
+│       ├── deploy
+│       │   ├── __init__.py
+│       │   ├── deploy_model.py
+│       │   └── predict.py
+│       └── model
+│           ├── __init__.py
+│           ├── cnn_architecture.py
+│           ├── digit_classifier.py
+│           ├── model_evaluation.py
+│           └── train_model.py
+└── tests
+```
+
+Deviations:
+
+- excluded model selection part for simplicity
+- data quality checks
+
+## Context
+
+Dagster uses "context" which requires a global variable to be passed around.
+Ordeq avoids this pattern by allowing nodes to request IOs directly.
+For example, metadata in Ordeq is just another IO that contains the metadata information, instead of a dedicated `context.add_output_metadata`
+See [parametrizing nodes] for more details.
+
+## Logging
+
+In Dagster you would write:
+
+```python
+context.log.info(
+    f"User requested deployment of custom model: {config.custom_model_name}"
+)
+```
+
+In most cases in Ordeq this becomes native Python logging:
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+
+# (...)
+
+logger.info(
+    f"User requested deployment of custom model: {config.custom_model_name}"
+)
+```
+
+Only when you need advanced structured logging features you would use Ordeq's `Logger` IO.
+
+## Configuration
+
+Dagster requires user to use own objects `dg.Config`, whereas in Ordeq you can use
+native Python types for configuration (constants, files, dataclasses, Pydantic or whatever your preference has).
+
+## Attributes
+
+```python
+@dg.asset(
+    description="Evaluate model performance on test set",
+    group_name="model_pipeline",
+    required_resource_keys={"model_storage"},
+    deps=["digit_classifier"],
+    ...
+)
+```
+
+```python
+@node(..., description="Evaluate model performance on test set")
+```
+
+The other attributes are not required, as `group_name` in inferred from the module name, and `required_resources_keys` and `deps` from the node inputs and outputs.
+
+## Cloud integration
+
+The Dagster example implements dedicated resources for S3 and local file system:
+https://github.com/dagster-io/dagster/blob/master/examples/docs_projects/project_ml/src/project_ml/defs/resources.py
+
+In Ordeq you can use the same code for both local and S3 storage by leveraging the existing IOs.
+(see [Storage IOs] for more details).
+
+## Asset checks
+
+Dagster has built-in asset checks, while in Ordeq you can implement similar functionality using nodes that validate data and raise exceptions if checks fail.
+https://github.com/dagster-io/dagster/blob/master/examples/docs_projects/project_ml/src/project_ml/defs/asset_checks.py
+
+For this guide we keep it out of scope.
diff --git a/docs/guides/kedro.md → docs/guides/adopting/kedro.md b/docs/guides/kedro.md → docs/guides/adopting/kedro.md
diff --git a/examples/project-ml/README.md b/examples/project-ml/README.md
@@ -0,0 +1,10 @@
+# Machine learning with PyTorch
+
+This project is based on an example from Dagster to demonstrate what code with similar functionality looks like in both frameworks.
+See also the original [pipeline guide](https://docs.dagster.io/examples/ml) and [dagster implementation](https://github.com/dagster-io/dagster/tree/master/examples/docs_projects/project_ml)
+
+Simply run the entire project using the following `uv` command:
+
+```shell
+uv run src/project_ml
+```
diff --git a/examples/project-ml/data/01_raw/.gitkeep b/examples/project-ml/data/01_raw/.gitkeep
diff --git a/examples/project-ml/data/02_models/.gitkeep b/examples/project-ml/data/02_models/.gitkeep
diff --git a/examples/project-ml/data/03_reports/.gitkeep b/examples/project-ml/data/03_reports/.gitkeep
diff --git a/examples/project-ml/pipeline_diagram.mermaid b/examples/project-ml/pipeline_diagram.mermaid
@@ -0,0 +1,129 @@
+graph TB
+	subgraph legend["Legend"]
+		direction TB
+		L0@{shape: rounded, label: "Node"}
+		L2@{shape: subroutine, label: "View"}
+		L00@{shape: rect, label: "IO"}
+		L01@{shape: rect, label: "Literal"}
+		L02@{shape: rect, label: "MatplotlibFigure"}
+		L03@{shape: rect, label: "Pickle"}
+	end
+
+	project_ml.data.raw_data_loading:raw_mnist_test_data --> project_ml.data.data_preprocessing:processed_mnist_test_data
+	project_ml.data.raw_data_loading:raw_mnist_train_data --> project_ml.data.data_preprocessing:processed_mnist_train_data
+	IO0 --> project_ml.data.data_preprocessing:processed_mnist_train_data
+	IO1 --> project_ml.data.data_preprocessing:processed_mnist_train_data
+	IO2 --> project_ml.data.raw_data_loading:raw_mnist_test_data
+	project_ml.data.raw_data_loading:transform --> project_ml.data.raw_data_loading:raw_mnist_test_data
+	IO3 --> project_ml.data.raw_data_loading:raw_mnist_train_data
+	project_ml.data.raw_data_loading:transform --> project_ml.data.raw_data_loading:raw_mnist_train_data
+	IO4 --> project_ml.data.raw_data_loading:transform
+	project_ml.deploy.predict:dummy_images --> project_ml.deploy.predict:batch_digit_predictions
+	IO5 --> project_ml.deploy.predict:batch_digit_predictions
+	IO6 --> project_ml.deploy.predict:batch_digit_predictions
+	project_ml.deploy.predict:inference_device --> project_ml.deploy.predict:batch_digit_predictions
+	project_ml.deploy.predict:batch_digit_predictions --> IO7
+	project_ml.deploy.predict:batch_digit_predictions --> IO8
+	project_ml.deploy.predict:dummy_batch --> project_ml.deploy.predict:digit_predictions
+	IO5 --> project_ml.deploy.predict:digit_predictions
+	IO9 --> project_ml.deploy.predict:digit_predictions
+	project_ml.deploy.predict:inference_device --> project_ml.deploy.predict:digit_predictions
+	project_ml.deploy.predict:digit_predictions --> IO10
+	project_ml.deploy.predict:digit_predictions --> IO11
+	IO9 --> project_ml.deploy.predict:dummy_batch
+	IO6 --> project_ml.deploy.predict:dummy_images
+	IO6 --> project_ml.deploy.predict:inference_device
+	IO5 --> project_ml.model.model_evaluation:model_evaluation
+	project_ml.model.model_evaluation:test_loader --> project_ml.model.model_evaluation:model_evaluation
+	project_ml.model.train_model:training_device --> project_ml.model.model_evaluation:model_evaluation
+	project_ml.model.model_evaluation:model_evaluation --> IO12
+	project_ml.model.model_evaluation:model_evaluation --> IO13
+	project_ml.model.model_evaluation:model_evaluation --> IO14
+	project_ml.data.data_preprocessing:processed_mnist_test_data --> project_ml.model.model_evaluation:test_loader
+	IO15 --> project_ml.model.model_evaluation:test_loader
+	IO16 --> project_ml.model.train_model:optimizer
+	project_ml.model.train_model:untrained_model --> project_ml.model.train_model:optimizer
+	IO16 --> project_ml.model.train_model:scheduler
+	project_ml.model.train_model:optimizer --> project_ml.model.train_model:scheduler
+	project_ml.data.data_preprocessing:processed_mnist_train_data --> project_ml.model.train_model:train_loader
+	IO16 --> project_ml.model.train_model:train_loader
+	project_ml.model.train_model:train_loader --> project_ml.model.train_model:train_model
+	project_ml.model.train_model:val_loader --> project_ml.model.train_model:train_model
+	IO16 --> project_ml.model.train_model:train_model
+	project_ml.model.train_model:training_device --> project_ml.model.train_model:train_model
+	project_ml.model.train_model:untrained_model --> project_ml.model.train_model:train_model
+	project_ml.model.train_model:optimizer --> project_ml.model.train_model:train_model
+	project_ml.model.train_model:scheduler --> project_ml.model.train_model:train_model
+	project_ml.model.train_model:train_model --> IO5
+	project_ml.model.train_model:train_model --> IO17
+	IO18 --> project_ml.model.train_model:untrained_model
+	project_ml.data.data_preprocessing:processed_mnist_train_data --> project_ml.model.train_model:val_loader
+	IO16 --> project_ml.model.train_model:val_loader
+
+	subgraph s0["project_ml.data.data_preprocessing"]
+		direction TB
+		project_ml.data.data_preprocessing:processed_mnist_test_data@{shape: subroutine, label: "processed_mnist_test_data"}
+		project_ml.data.data_preprocessing:processed_mnist_train_data@{shape: subroutine, label: "processed_mnist_train_data"}
+	end
+	subgraph s1["project_ml.data.raw_data_loading"]
+		direction TB
+		project_ml.data.raw_data_loading:raw_mnist_test_data@{shape: subroutine, label: "raw_mnist_test_data"}
+		project_ml.data.raw_data_loading:raw_mnist_train_data@{shape: subroutine, label: "raw_mnist_train_data"}
+		project_ml.data.raw_data_loading:transform@{shape: subroutine, label: "transform"}
+	end
+	subgraph s2["project_ml.deploy.predict"]
+		direction TB
+		project_ml.deploy.predict:batch_digit_predictions@{shape: rounded, label: "batch_digit_predictions"}
+		project_ml.deploy.predict:digit_predictions@{shape: rounded, label: "digit_predictions"}
+		project_ml.deploy.predict:dummy_batch@{shape: subroutine, label: "dummy_batch"}
+		project_ml.deploy.predict:dummy_images@{shape: subroutine, label: "dummy_images"}
+		project_ml.deploy.predict:inference_device@{shape: subroutine, label: "inference_device"}
+	end
+	subgraph s3["project_ml.model.model_evaluation"]
+		direction TB
+		project_ml.model.model_evaluation:model_evaluation@{shape: rounded, label: "model_evaluation"}
+		project_ml.model.model_evaluation:test_loader@{shape: subroutine, label: "test_loader"}
+	end
+	subgraph s4["project_ml.model.train_model"]
+		direction TB
+		project_ml.model.train_model:optimizer@{shape: subroutine, label: "optimizer"}
+		project_ml.model.train_model:scheduler@{shape: subroutine, label: "scheduler"}
+		project_ml.model.train_model:train_loader@{shape: subroutine, label: "train_loader"}
+		project_ml.model.train_model:train_model@{shape: rounded, label: "train_model"}
+		project_ml.model.train_model:training_device@{shape: subroutine, label: "training_device"}
+		project_ml.model.train_model:untrained_model@{shape: subroutine, label: "untrained_model"}
+		project_ml.model.train_model:val_loader@{shape: subroutine, label: "val_loader"}
+	end
+	IO0@{shape: rect, label: "validation_split"}
+	IO1@{shape: rect, label: "random_seed"}
+	IO10@{shape: rect, label: "real_time_predictions"}
+	IO11@{shape: rect, label: "real_time_prediction_metadata"}
+	IO12@{shape: rect, label: "model_evaluation_result"}
+	IO13@{shape: rect, label: "confusion_matrix"}
+	IO14@{shape: rect, label: "model_evaluation_metadata"}
+	IO15@{shape: rect, label: "model_evaluation_config"}
+	IO16@{shape: rect, label: "training_config"}
+	IO17@{shape: rect, label: "training_metadata"}
+	IO18@{shape: rect, label: "model_config"}
+	IO2@{shape: rect, label: "test_dataset"}
+	IO3@{shape: rect, label: "train_dataset"}
+	IO4@{shape: rect, label: "mnist_moments"}
+	IO5@{shape: rect, label: "production_model"}
+	IO6@{shape: rect, label: "batch_prediction_config"}
+	IO7@{shape: rect, label: "batch_predictions"}
+	IO8@{shape: rect, label: "batch_prediction_metadata"}
+	IO9@{shape: rect, label: "real_time_prediction_config"}
+
+	class L0,project_ml.deploy.predict:batch_digit_predictions,project_ml.deploy.predict:digit_predictions,project_ml.model.model_evaluation:model_evaluation,project_ml.model.train_model:train_model node
+	class L2,project_ml.data.data_preprocessing:processed_mnist_test_data,project_ml.data.data_preprocessing:processed_mnist_train_data,project_ml.data.raw_data_loading:raw_mnist_test_data,project_ml.data.raw_data_loading:raw_mnist_train_data,project_ml.data.raw_data_loading:transform,project_ml.deploy.predict:dummy_batch,project_ml.deploy.predict:dummy_images,project_ml.deploy.predict:inference_device,project_ml.model.model_evaluation:test_loader,project_ml.model.train_model:optimizer,project_ml.model.train_model:scheduler,project_ml.model.train_model:train_loader,project_ml.model.train_model:training_device,project_ml.model.train_model:untrained_model,project_ml.model.train_model:val_loader view
+	class L00,IO10,IO11,IO12,IO14,IO17,IO7,IO8 io0
+	class L01,IO0,IO1,IO15,IO16,IO18,IO2,IO3,IO4,IO6,IO9 io1
+	class L02,IO13 io2
+	class L03,IO5 io3
+	classDef node fill:#008AD7,color:#FFF
+	classDef io fill:#FFD43B
+	classDef view fill:#00C853,color:#FFF
+	classDef io0 fill:#66c2a5
+	classDef io1 fill:#fc8d62
+	classDef io2 fill:#8da0cb
+	classDef io3 fill:#e78ac3
diff --git a/examples/project-ml/pyproject.toml b/examples/project-ml/pyproject.toml
@@ -0,0 +1,18 @@
+[project]
+name = "project_ml"
+version = "0.1.0"
+description = "Example ML project using Ordeq"
+requires-python = ">=3.10"
+dependencies = [
+    "ordeq",
+    "ordeq-viz",
+    "ordeq-matplotlib",
+    "numpy>=2.0.2",
+    "scikit-learn>=1.6.1",
+    "seaborn>=0.13.2",
+    "torch>=2.8.0",
+    "torchvision",
+]
+
+[tool.ruff.lint]
+extend-ignore = ["G004"]  # f-strings in logging, coming from the upstream example code
diff --git a/examples/project-ml/src/project_ml/__init__.py b/examples/project-ml/src/project_ml/__init__.py
diff --git a/examples/project-ml/src/project_ml/__main__.py b/examples/project-ml/src/project_ml/__main__.py
@@ -0,0 +1,21 @@
+import logging
+from pathlib import Path
+
+from ordeq_viz import viz
+
+from project_ml import catalog, data, deploy, model
+
+ROOT_PATH = Path(__file__).parent.parent.parent
+
+logging.basicConfig(level=logging.INFO)
+
+if __name__ == "__main__":
+    pipeline = {data, model, deploy}
+    viz(
+        *pipeline,
+        catalog,
+        fmt="mermaid",
+        output=ROOT_PATH / "pipeline_diagram.mermaid",
+        subgraphs=True,
+    )
+    # run(*pipeline)