Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
3e67eb0
Designing experiments checklists FAQ - unfinished
Hrovatin Aug 3, 2025
24559ad
Expand checklist
Hrovatin Aug 5, 2025
9785796
Add design checklist to FAQ
Hrovatin Oct 23, 2025
6a4cdb7
Make introduction clearer
Hrovatin Oct 23, 2025
22718e0
Format readme introduction
Hrovatin Oct 23, 2025
2fa2190
Simplify readme introduction language
Hrovatin Oct 24, 2025
6d9baf3
Move API overview diagram into this repository
Hrovatin Oct 24, 2025
e74be96
Rename API overview drawio file
Hrovatin Oct 24, 2025
beb9b9c
Make drawio template quick start diagram
Hrovatin Oct 24, 2025
d58148a
Add quick start diagram
Hrovatin Oct 24, 2025
93b2e81
Add complex searchspace drawio diagram
Hrovatin Oct 24, 2025
6f6f4ac
Update diagram
Hrovatin Oct 24, 2025
c09a626
Make readme more accessible for new users
Hrovatin Oct 24, 2025
1832a47
Format FAQ
Hrovatin Oct 24, 2025
0e293b2
Update changelog
Hrovatin Oct 24, 2025
0da883e
Add a few more examples
Hrovatin Oct 27, 2025
e963d6b
Fix typo
Hrovatin Oct 27, 2025
fddcdf0
Add emoji to example use cases
Hrovatin Oct 28, 2025
3b15705
reword
Hrovatin Oct 28, 2025
f86a1c8
Intro of Checklist for designing BayBE optimization campaigns
Hrovatin Oct 28, 2025
299ade3
reword
Hrovatin Oct 28, 2025
0fa28d5
typo
Hrovatin Oct 28, 2025
264648e
Force documentation build - TEMPORARY
Hrovatin Oct 29, 2025
cbc3aa3
Remove offending link and thus reset workflow to original state
Hrovatin Oct 29, 2025
cd17c86
Fix the use of the word experiment
Hrovatin Oct 29, 2025
8f3f34f
Use telegraph style for all headings
Hrovatin Oct 29, 2025
c6cf29d
reword
Hrovatin Oct 29, 2025
5584eb1
rweord
Hrovatin Oct 30, 2025
75718b5
Add example use case
Hrovatin Oct 30, 2025
b1aaa0e
Remove oxford comma
Hrovatin Oct 30, 2025
60c8f0e
typo
Hrovatin Oct 30, 2025
af543f9
reword
Hrovatin Oct 30, 2025
08ee66b
Fix file name typo
Hrovatin Oct 30, 2025
8c3b9f3
Try not using svg with automatic light/dark mode
Hrovatin Oct 30, 2025
cf01ac2
fix image links in readme
Hrovatin Oct 30, 2025
4f17ccd
Remove changelog entry
Hrovatin Nov 4, 2025
2ddcf59
Fix typo
Hrovatin Nov 4, 2025
16588c3
Remove unnecessary wording
Hrovatin Nov 4, 2025
6cf645c
Reword to match API terms properly
Hrovatin Nov 4, 2025
186e6b4
Reword to properly match the API wording
Hrovatin Nov 4, 2025
12318cf
Fix readme image links
Hrovatin Nov 4, 2025
9b1c809
reword
Hrovatin Nov 5, 2025
522ee1d
reword
Hrovatin Nov 5, 2025
7ad39d1
reword
Hrovatin Nov 5, 2025
451c99b
reweord
Hrovatin Nov 5, 2025
7bdd5a5
Reword features
Hrovatin Nov 5, 2025
1d13bf0
Reword user to you
Hrovatin Nov 5, 2025
b2d38c9
Unify wording of setting (BO method) and configuration (chosen parame…
Hrovatin Nov 6, 2025
e81c2c2
Fix rendering style
Hrovatin Nov 6, 2025
eb701cf
reword
Hrovatin Nov 6, 2025
07d4f98
Remove setup checklist
Hrovatin Nov 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 110 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,35 +27,77 @@

# BayBE — A Bayesian Back End for Design of Experiments

The **Bay**esian **B**ack **E**nd (**BayBE**) is a general-purpose toolbox for Bayesian Design
of Experiments, focusing on additions that enable real-world experimental campaigns.
The **Bay**esian **B**ack **E**nd (**BayBE**) helps to find **good parameter configurations**
within complex parameter search spaces.

<div align="center">

[![Complex Search Space](https://raw.githubusercontent.com/Hrovatin/baybe/docs/easy_access/docs/_static/complex_search_space_automatic.svg)](https://github.com/emdgroup/baybe/)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This requires mini PR of uploading images before merge and updating the link

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context from my end as we investigated this: The issue is that Sphinx has problems trying to load the static file directly from the repo here: You can either give it a path such that it finds it in the README on github but not in the compiled version or vice versa. Hence the fix to link to the static image on github itself.


</div>

Example use cases:

- 🧪 Find chemical reaction conditions or process parameters
- 🥣 Create materials, chemical mixtures or formulations with desired properties
- ✈️ Optimize the 3D shape of a physical object
- 🖥️ Optimize a virtual simulation
- ⚙️ Select model hyperparameters
- 🫖 Find tasty espresso machine settings

This is achieved via **Bayesian Design of Experiments**,
which helps to efficiently navigate parameter search spaces.
It balances
exploitation of parameter space regions known to lead to good outcomes
and exploration of unknown regions.

BayBE provides a **general-purpose toolbox** for Bayesian Design of Experiments,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the sentence directly after the next heading feel very repetitive. Would combine or do something to avoid that

Copy link
Collaborator Author

@Hrovatin Hrovatin Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current:

BayBE provides a **general-purpose toolbox** for Bayesian Design of Experiments, 
focusing on making this procedure easily-accessible for real-world experiments.

## 🔋 Batteries Included
BayBE offers a range of ✨**built&#8209;in&nbsp;features**✨ 
crucial for real-world use cases.
The following provides a non-comprehensive overview:

Proposed (merge and remove the batteries included heading, having is as part of general intro):

BayBE provides a **general-purpose toolbox** for Bayesian Design of Experiments, 
focusing on making this procedure easily-accessible for real-world experiments. 
It offers a range of ✨**built&#8209;in&nbsp;features**✨, 
with a non-comprehensive overview outlined below:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Scienfitz any opinion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @AdrianSosic was very keen on adding the battery phrase so since you want to remove it I think you both have to sync

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not married to it, but I still think it sends a very clear message that this package cares deeply about usability – which is in fact the main purpose of it (otherwise, people could just use botorch). Does it hurt to keep it?

focusing on making this procedure easily accessible for real-world experiments.


## 🔋 Batteries Included
Besides its core functionality to perform a typical recommend-measure loop, BayBE
offers a range of ✨**built&#8209;in&nbsp;features**✨ crucial for real-world use cases.
BayBE offers a range of ✨**built&#8209;in&nbsp;features**✨ crucial for real-world use cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not delete the Besides... part

Copy link
Collaborator Author

@Hrovatin Hrovatin Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to shorten where I felt that text may be redundant. But I can add back if prefered @Scienfitz

See proposal in: #676 (comment)

The following provides a non-comprehensive overview:

- 🛠️ Custom parameter encodings: Improve your campaign with domain knowledge
- 🧪 Built-in chemical encodings: Improve your campaign with chemical knowledge
- 🎯 Numerical and binary targets with min, max and match objectives
- ⚖️ Multi-target support via Pareto optimization and desirability scalarization
- 🔍 Insights: Easily analyze feature importance and model behavior
- 🎭 Hybrid (mixed continuous and discrete) spaces
- 🚀 Transfer learning: Mix data from multiple campaigns and accelerate optimization
- 🎰 Bandit models: Efficiently find the best among many options in noisy environments (e.g. A/B Testing)
- 🔢 Cardinality constraints: Control the number of active factors in your design
- 🌎 Distributed workflows: Run campaigns asynchronously with pending experiments and partial measurements
- 🎓 Active learning: Perform smart data acquisition campaigns
- ⚙️ Custom surrogate models: Enhance your predictions through mechanistic understanding
- 📈 Comprehensive backtest, simulation and imputation utilities: Benchmark and find your best settings
- 📝 Fully typed and hypothesis-tested: Robust code base
- 🔄 All objects are fully (de-)serializable: Useful for storing results in databases or use in wrappers like APIs
- 📚 Leverage **domain knowledge**:
- 🎨 Capture relationships between categories by encoding categorical data. BayBE also provides built-in chemical encodings.
- 🛠️ Build in mechanistic process understanding via custom surrogate models.
- 🏛️ Leverage **historic data** from similar campaigns to accelerate optimization via transfer learning.
- 🌀 **Flexibly** define target outcomes, parameter search spaces and optimization strategies:
- 🎯 Choose between numerical targets (e.g., experimental outcome values) or binary targets (e.g., good/bad classification of experimental results). Targets can be minimized, maximized or matched to a specific value.
- 👥👥 Optimized multiple targets at once (e.g., via Pareto optimization or desirability scalarization).
- 🎭 Use both continuous and discrete parameters can within a single search space.
- 🔢 Define a maximal number of mixture components via cardinality constraints.
- ⚖️ Choose between different optimization strategies to balance exploration and exploitation of the search space:
- 🌍 Gain the understanding of the whole search space via active learning.
- 🎰 Maximize total gain across a sequence of actions via bandit models.
- 🌐 Run campaigns **asynchronously** with pending experiments and partial measurements.
- 🔍 Gain **insights** about the optimization campaigns by analyzing feature importance and model behavior.
- 📈 Conduct **benchmarks** to select between different Bayesian optimization settings via backtesting.
- 🔄 Connect BayBE with **database storage and API wrappers** using the serialization functionality.
- 📝 Rely on **high-quality code base** with comprehensive tests and typing.


## ⚡ Quick Start

Let us consider a simple experiment where we control three parameters and want to
maximize a single target called `Yield`.
To perform Bayesian Design of Experiments with BayBE,
you should first specify the **parameter search space** and **objective** to be optimized.
Based on this information and any **available data** about outcomes of specific parameter configurations,
BayBE will **recommend the next set of parameter configurations** to be **measured**.
To inform the next recommendation cycle, the newly generated measurements can be added to BayBE.

<div align="center">

[![Quick Start](https://raw.githubusercontent.com/Hrovatin/baybe/docs/easy_access/docs/_static/quick_start_automatic.svg)](https://github.com/emdgroup/baybe/)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This requires mini PR of uploading images before merge and updating the link

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context from my end as we investigated this: The issue is that Sphinx has problems trying to load the static file directly from the repo here: You can either give it a path such that it finds it in the README on github but not in the compiled version or vice versa. Hence the fix to link to the static image on github itself.


</div>

From the user perspective, the most important part is the "design" step.
Copy link
Collaborator

@AdrianSosic AdrianSosic Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#rigor

Perhaps no one of you noticed, but I still think this is a very critical points since highly misleading. With "design", you refer to the step where the user defines the components of the the campaign etc. However, please note that we're fully in the DOE context here, where "design" has a very different meaning – it's literally in the name of the approach. In (Bayesian) DOE, the "design" refers to the entire end-to-end process, i.e. it also includes (and one could argue this is actually the main part of it!) the actual placement of the experiments in the search space according to some criterion – so that would correspond not the parameter definition step but actually to the recommendation step.

Thus, I think it's crucial that we adjust the terminology here (and also in the "design" user guide)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call it setup step or similar to avoid the word design

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdrianSosic would "setup" be ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely better than "design" 👍🏼


Below we show a simple optimization procedure, starting with the design step and subsequently
performing the recommendation loop.
The provided example aims to maximize the yield of a chemical reaction by adjusting its parameter configurations
(also known as reaction conditions).

First, install BayBE into your Python environment:
```bash
Expand All @@ -66,7 +108,7 @@ For more information on this step, see our

### Defining the Optimization Objective

In BayBE's language, the `Yield` can be represented as a `NumericalTarget`,
In BayBE's language, the reaction yield can be represented as a `NumericalTarget`,
which we wrap into a `SingleTargetObjective`:

```python
Expand All @@ -76,21 +118,19 @@ from baybe.objectives import SingleTargetObjective
target = NumericalTarget(name="Yield")
objective = SingleTargetObjective(target=target)
```
In cases where we are confronted with multiple (potentially conflicting) targets,
the `ParetoObjective` or `DesirabilityObjective` can be used instead.
These allow to define additional settings, such as how the targets should be balanced.
In cases where we are confronted with multiple (potentially conflicting) targets
(e.g., yield vs cost),
the `ParetoObjective` or `DesirabilityObjective` can be used to define how the targets should be balanced.
For more details, see the
[objectives section](https://emdgroup.github.io/baybe/stable/userguide/objectives.html)
of the user guide.

### Defining the Search Space

Next, we inform BayBE about the available "control knobs", that is, the underlying
system parameters we can tune to optimize our targets. This also involves specifying
their values/ranges and other parameter-specific details.

For our example, we assume that we can control three parameters – `Granularity`,
`Pressure[bar]`, and `Solvent` – as follows:
reaction parameters we can tune to optimize the yield.
In this case we tune granularity, pressure and solvent, each being encoded as a `Parameter`.
We also need to specify which values individual parameters can take.

```python
from baybe.parameters import (
Expand Down Expand Up @@ -147,20 +187,15 @@ and alternative ways of construction.

### Optional: Defining the Optimization Strategy

As an optional step, we can specify details on how the optimization should be
conducted. If omitted, BayBE will choose a default setting.
As an optional step, we can specify details on how the optimization of the experimental configurations should be
performed. If omitted, BayBE will choose a default Bayesian optimization setting.

For our example, we combine two recommenders via a so-called meta recommender named
`TwoPhaseMetaRecommender`:

1. In cases where no measurements have been made prior to the interaction with BayBE,
a selection via `initial_recommender` is used.
2. As soon as the first measurements are available, we switch to `recommender`.

For more details on the different recommenders, their underlying algorithmic
details, and their configuration settings, see the
[recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html)
of the user guide.
the parameters will be recommended with the `initial_recommender`.
2. As soon as the first measurements are available, we switch to the `recommender`.

```python
from baybe.recommenders import (
Expand All @@ -175,65 +210,77 @@ recommender = TwoPhaseMetaRecommender(
)
```

For more details on the different recommenders, their underlying algorithmic
details and how their settings can be adjusted, see the
[recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html)
of the user guide.

### The Optimization Loop

We can now construct a campaign object that brings all pieces of the puzzle together:
We can now construct a `Campaign` that performs the Bayesian optimization of the experimental configurations:

```python
from baybe import Campaign

campaign = Campaign(searchspace, objective, recommender)
```

With this object at hand, we can start our experimentation cycle.
With this object at hand, we can start our optimization cycle.
In particular:

* We can ask BayBE to `recommend` new experiments.
* We can `add_measurements` for certain experimental settings to the campaign's
database.
* The campaign can `recommend` new experiments.
* We can `add_measurements` of target values for the measured parameter configurations
to the campaign's database.

Note that these two steps can be performed in any order.
In particular, available measurements can be submitted at any time and also several
times before querying the next recommendations.

```python
df = campaign.recommend(batch_size=3)
df = campaign.recommend(batch_size=3) # Recommend three experimental configurations to test
print(df)
```

The below table shows the three parameter configurations for which BayBE recommended to
measure the reaction yield.

Note that the specific recommendations will depend on both the data
already fed to the campaign and the random number generator seed that is used.

```none
Granularity Pressure[bar] Solvent
15 medium 1.0 Solvent D
10 coarse 10.0 Solvent C
29 fine 5.0 Solvent B
```

Note that the specific recommendations will depend on both the data
already fed to the campaign and the random number generator seed that is used.

After having conducted the corresponding experiments, we can add our measured
targets to the table and feed it back to the campaign:
After having conducted the recommended experiments, we can add the newly measured
target information to the campaign:

```python
df["Yield"] = [79.8, 54.1, 59.4]
df["Yield"] = [79.8, 54.1, 59.4] # Measured yields for the three recommended parameter configurations
campaign.add_measurements(df)
```

With the newly arrived data, BayBE can produce a refined design for the next iteration.
This loop would typically continue until a desired target value has been achieved in
the experiment.
With the newly provided data, BayBE can produce a refined recommendation for the next iteration.
This loop typically continues until a desired target value is achieved in the experiment.

### Advanced Example: Chemical Substances
BayBE has several modules to go beyond traditional approaches. One such example is the
use of custom encodings for categorical parameters. Chemical encodings for substances
are a special built-in case of this that comes with BayBE.
### Inspect the Progress of the Experimental Configuration Optimization

The below plot shows progression of a campaign that optimized direct arylation reaction
by tuning the solvent, base and ligand
(from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)).
Each line shows the best target value that was cumulatively achieved after a given number of experimental iterations.


Different lines show outcomes of `Campaigns` with different designs.

In the following picture you can see
the outcome for treating the solvent, base and ligand in a direct arylation reaction
optimization (from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)) with
chemical encodings compared to one-hot and a random baseline:
![Substance Encoding Example](./examples/Backtesting/full_lookup_light.svg)

In particular, the five `Campaigns` differ in how molecules are encoded within each chemical `Parameter`.
We can see that optimization is more efficient when
using chemical encodings (e.g., *MORDRED*) rather than encoding categories with *one-hot* encoding or *random* features.

(installation)=
## 💻 Installation
### From Package Index
Expand Down Expand Up @@ -263,7 +310,7 @@ pip install git+https://github.com/emdgroup/baybe.git@main

Alternatively, you can install the package from your own local copy.
First, clone the repository, navigate to the repository root folder, check out the
desired commit, and run:
desired commit and run:

```bash
pip install .
Expand Down
Loading
Loading