rubicon-ml

Purpose

rubicon-ml is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way. Its git integration associates these inputs and outputs directly with the model code that produced them to ensure full auditability and reproducibility for both developers and stakeholders alike. While experimenting, the dashboard makes it easy to explore, filter, visualize, and share recorded work.

Components

rubicon-ml is composed of three parts:

A Python library for storing and retrieving model inputs, outputs, and analyses to filesystems that’s powered by fsspec
A dashboard for exploring, comparing, and visualizing logged data built with dash
And a process for sharing a selected subset of logged data with collaborators or reviewers that leverages intake

Workflow

Use rubicon_ml to capture model inputs and outputs over time. It can be easily integrated into existing Python models or pipelines and supports both concurrent logging (so multiple experiments can be logged in parallel) and asynchronous communication with S3 (so network reads and writes won’t block).

Meanwhile, periodically review the logged data within the Rubicon dashboard to steer the model tweaking process in the right direction. The dashboard lets you quickly spot trends by exploring and filtering your logged results and visualizes how the model inputs impacted the model outputs.

When the model is ready for review, Rubicon makes it easy to share specific subsets of the data with model reviewers and stakeholders, giving them the context necessary for a complete model review and approval.

Use

Check out the interactive notebooks in this Binder to try rubicon_ml for yourself.

Here's a simple example:

from rubicon_ml import Rubicon

rubicon = Rubicon(
    persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)

project = rubicon.create_project(
    "Hello World", description="Using rubicon to track model results over time."
)

experiment = project.log_experiment(
    training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
    model_name="My Model Name",
    tags=["my_model_name"],
)

experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)

accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)

Then explore the project by running the dashboard:

rubicon_ml ui --root-dir /rubicon-root

Documentation

For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.

Install

The Python library is available on Conda Forge via conda and PyPi via pip.

conda config --add channels conda-forge
conda install rubicon-ml

or

pip install rubicon-ml

Develop

To contribute, check out our developer guide for the latest instructions on setting up your local developer environment.

Tests

The tests are separated into unit and integration tests. They can be run directly in the uv environment via uv run pytest tests/unit or uv run pytest tests/integration. Or by simply running uv run pytest to execute all of them.

Note: some integration tests are intentionally marked to control when they are run (i.e. not during CICD). These tests include:

Integration tests that write to physical filesystems - local and S3. Local files will be written to ./test-rubicon relative to where the tests are run. An S3 path must also be provided to run these tests. By default, these tests are disabled. To enable them, run:
```
uv run pytest -m "write_files" --s3-path "s3://my-bucket/my-key"
```
Integration tests that run Jupyter notebooks. These tests are a bit slower than the rest of the tests in the suite as they need to launch Jupyter servers. By default, they are enabled. To disable them, run:
```
uv run pytest -m "not run_notebooks and not write_files"
```
Note: When simply running uv run pytest, -m "not write_files" is the default. So, we need to also apply it when disabling notebook tests.

Style

We use ruff for linting and formatting. To check and update the code style, run:

uv run ruff check --fix
uv run ruff format

Install and configure pre-commit to automatically run ruff during commits:

install pre-commit
run uv run pre-commit install to set up the git hook scripts

Now pre-commit will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via uv run pre-commit run --all-files or skip these checks with git commit --no-verify.

If you're looking for Rubicon, the Java & Objective C to Python bridge, visit here.

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
.github		.github
binder		binder
docs		docs
notebooks		notebooks
rubicon_ml		rubicon_ml
tests		tests
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rubicon-ml

Purpose

Components

Workflow

Use

Documentation

Install

Develop

Tests

Style

About

Uh oh!

Releases 68

Packages

Uh oh!

Contributors 20

Uh oh!

Languages

License

capitalone/rubicon-ml

Folders and files

Latest commit

History

Repository files navigation

rubicon-ml

Purpose

Components

Workflow

Use

Documentation

Install

Develop

Tests

Style

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 68

Packages 0

Uh oh!

Contributors 20

Uh oh!

Languages

Packages