diff --git a/README.md b/README.md index 8c08b5f1e5..7fdf7798b6 100644 --- a/README.md +++ b/README.md @@ -27,35 +27,77 @@ # BayBE โ€” A Bayesian Back End for Design of Experiments -The **Bay**esian **B**ack **E**nd (**BayBE**) is a general-purpose toolbox for Bayesian Design -of Experiments, focusing on additions that enable real-world experimental campaigns. +The **Bay**esian **B**ack **E**nd (**BayBE**) helps to find **good parameter configurations** +within complex parameter search spaces. + +
+ +[![Complex Search Space](https://raw.githubusercontent.com/Hrovatin/baybe/docs/easy_access/docs/_static/complex_search_space_automatic.svg)](https://github.com/emdgroup/baybe/) + +
+ +Example use cases: + +- ๐Ÿงช Find chemical reaction conditions or process parameters +- ๐Ÿฅฃ Create materials, chemical mixtures or formulations with desired properties +- โœˆ๏ธ Optimize the 3D shape of a physical object +- ๐Ÿ–ฅ๏ธ Optimize a virtual simulation +- โš™๏ธ Select model hyperparameters +- ๐Ÿซ– Find tasty espresso machine settings + +This is achieved via **Bayesian Design of Experiments**, +which helps to efficiently navigate parameter search spaces. +It balances +exploitation of parameter space regions known to lead to good outcomes +and exploration of unknown regions. + +BayBE provides a **general-purpose toolbox** for Bayesian Design of Experiments, +focusing on making this procedure easily accessible for real-world experiments. + ## ๐Ÿ”‹ Batteries Included -Besides its core functionality to perform a typical recommend-measure loop, BayBE -offers a range of โœจ**built‑in features**โœจ crucial for real-world use cases. +BayBE offers a range of โœจ**built‑in features**โœจ crucial for real-world use cases. The following provides a non-comprehensive overview: -- ๐Ÿ› ๏ธ Custom parameter encodings: Improve your campaign with domain knowledge -- ๐Ÿงช Built-in chemical encodings: Improve your campaign with chemical knowledge -- ๐ŸŽฏ Numerical and binary targets with min, max and match objectives -- โš–๏ธ Multi-target support via Pareto optimization and desirability scalarization -- ๐Ÿ” Insights: Easily analyze feature importance and model behavior -- ๐ŸŽญ Hybrid (mixed continuous and discrete) spaces -- ๐Ÿš€ Transfer learning: Mix data from multiple campaigns and accelerate optimization -- ๐ŸŽฐ Bandit models: Efficiently find the best among many options in noisy environments (e.g. A/B Testing) -- ๐Ÿ”ข Cardinality constraints: Control the number of active factors in your design -- ๐ŸŒŽ Distributed workflows: Run campaigns asynchronously with pending experiments and partial measurements -- ๐ŸŽ“ Active learning: Perform smart data acquisition campaigns -- โš™๏ธ Custom surrogate models: Enhance your predictions through mechanistic understanding -- ๐Ÿ“ˆ Comprehensive backtest, simulation and imputation utilities: Benchmark and find your best settings -- ๐Ÿ“ Fully typed and hypothesis-tested: Robust code base -- ๐Ÿ”„ All objects are fully (de-)serializable: Useful for storing results in databases or use in wrappers like APIs +- ๐Ÿ“š Leverage **domain knowledge**: + - ๐ŸŽจ Capture relationships between categories by encoding categorical data. BayBE also provides built-in chemical encodings. + - ๐Ÿ› ๏ธ Build in mechanistic process understanding via custom surrogate models. +- ๐Ÿ›๏ธ Leverage **historic data** from similar campaigns to accelerate optimization via transfer learning. +- ๐ŸŒ€ **Flexibly** define target outcomes, parameter search spaces and optimization strategies: + - ๐ŸŽฏ Choose between numerical targets (e.g., experimental outcome values) or binary targets (e.g., good/bad classification of experimental results). Targets can be minimized, maximized or matched to a specific value. + - ๐Ÿ‘ฅ๐Ÿ‘ฅ Optimized multiple targets at once (e.g., via Pareto optimization or desirability scalarization). + - ๐ŸŽญ Use both continuous and discrete parameters can within a single search space. + - ๐Ÿ”ข Define a maximal number of mixture components via cardinality constraints. + - โš–๏ธ Choose between different optimization strategies to balance exploration and exploitation of the search space: + - ๐ŸŒ Gain the understanding of the whole search space via active learning. + - ๐ŸŽฐ Maximize total gain across a sequence of actions via bandit models. +- ๐ŸŒ Run campaigns **asynchronously** with pending experiments and partial measurements. +- ๐Ÿ” Gain **insights** about the optimization campaigns by analyzing feature importance and model behavior. +- ๐Ÿ“ˆ Conduct **benchmarks** to select between different Bayesian optimization settings via backtesting. +- ๐Ÿ”„ Connect BayBE with **database storage and API wrappers** using the serialization functionality. +- ๐Ÿ“ Rely on **high-quality code base** with comprehensive tests and typing. ## โšก Quick Start -Let us consider a simple experiment where we control three parameters and want to -maximize a single target called `Yield`. +To perform Bayesian Design of Experiments with BayBE, +you should first specify the **parameter search space** and **objective** to be optimized. +Based on this information and any **available data** about outcomes of specific parameter configurations, +BayBE will **recommend the next set of parameter configurations** to be **measured**. +To inform the next recommendation cycle, the newly generated measurements can be added to BayBE. + +
+ +[![Quick Start](https://raw.githubusercontent.com/Hrovatin/baybe/docs/easy_access/docs/_static/quick_start_automatic.svg)](https://github.com/emdgroup/baybe/) + +
+ +From the user perspective, the most important part is the "design" step. + +Below we show a simple optimization procedure, starting with the design step and subsequently +performing the recommendation loop. +The provided example aims to maximize the yield of a chemical reaction by adjusting its parameter configurations +(also known as reaction conditions). First, install BayBE into your Python environment: ```bash @@ -66,7 +108,7 @@ For more information on this step, see our ### Defining the Optimization Objective -In BayBE's language, the `Yield` can be represented as a `NumericalTarget`, +In BayBE's language, the reaction yield can be represented as a `NumericalTarget`, which we wrap into a `SingleTargetObjective`: ```python @@ -76,9 +118,9 @@ from baybe.objectives import SingleTargetObjective target = NumericalTarget(name="Yield") objective = SingleTargetObjective(target=target) ``` -In cases where we are confronted with multiple (potentially conflicting) targets, -the `ParetoObjective` or `DesirabilityObjective` can be used instead. -These allow to define additional settings, such as how the targets should be balanced. +In cases where we are confronted with multiple (potentially conflicting) targets +(e.g., yield vs cost), +the `ParetoObjective` or `DesirabilityObjective` can be used to define how the targets should be balanced. For more details, see the [objectives section](https://emdgroup.github.io/baybe/stable/userguide/objectives.html) of the user guide. @@ -86,11 +128,9 @@ of the user guide. ### Defining the Search Space Next, we inform BayBE about the available "control knobs", that is, the underlying -system parameters we can tune to optimize our targets. This also involves specifying -their values/ranges and other parameter-specific details. - -For our example, we assume that we can control three parameters โ€“ `Granularity`, -`Pressure[bar]`, and `Solvent` โ€“ as follows: +reaction parameters we can tune to optimize the yield. +In this case we tune granularity, pressure and solvent, each being encoded as a `Parameter`. +We also need to specify which values individual parameters can take. ```python from baybe.parameters import ( @@ -147,20 +187,15 @@ and alternative ways of construction. ### Optional: Defining the Optimization Strategy -As an optional step, we can specify details on how the optimization should be -conducted. If omitted, BayBE will choose a default setting. +As an optional step, we can specify details on how the optimization of the experimental configurations should be +performed. If omitted, BayBE will choose a default Bayesian optimization setting. For our example, we combine two recommenders via a so-called meta recommender named `TwoPhaseMetaRecommender`: 1. In cases where no measurements have been made prior to the interaction with BayBE, - a selection via `initial_recommender` is used. -2. As soon as the first measurements are available, we switch to `recommender`. - -For more details on the different recommenders, their underlying algorithmic -details, and their configuration settings, see the -[recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html) -of the user guide. + the parameters will be recommended with the `initial_recommender`. +2. As soon as the first measurements are available, we switch to the `recommender`. ```python from baybe.recommenders import ( @@ -175,9 +210,14 @@ recommender = TwoPhaseMetaRecommender( ) ``` +For more details on the different recommenders, their underlying algorithmic +details and how their settings can be adjusted, see the +[recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html) +of the user guide. + ### The Optimization Loop -We can now construct a campaign object that brings all pieces of the puzzle together: +We can now construct a `Campaign` that performs the Bayesian optimization of the experimental configurations: ```python from baybe import Campaign @@ -185,22 +225,28 @@ from baybe import Campaign campaign = Campaign(searchspace, objective, recommender) ``` -With this object at hand, we can start our experimentation cycle. +With this object at hand, we can start our optimization cycle. In particular: -* We can ask BayBE to `recommend` new experiments. -* We can `add_measurements` for certain experimental settings to the campaign's - database. +* The campaign can `recommend` new experiments. +* We can `add_measurements` of target values for the measured parameter configurations + to the campaign's database. Note that these two steps can be performed in any order. In particular, available measurements can be submitted at any time and also several times before querying the next recommendations. ```python -df = campaign.recommend(batch_size=3) +df = campaign.recommend(batch_size=3) # Recommend three experimental configurations to test print(df) ``` +The below table shows the three parameter configurations for which BayBE recommended to +measure the reaction yield. + +Note that the specific recommendations will depend on both the data +already fed to the campaign and the random number generator seed that is used. + ```none Granularity Pressure[bar] Solvent 15 medium 1.0 Solvent D @@ -208,32 +254,33 @@ print(df) 29 fine 5.0 Solvent B ``` -Note that the specific recommendations will depend on both the data -already fed to the campaign and the random number generator seed that is used. - -After having conducted the corresponding experiments, we can add our measured -targets to the table and feed it back to the campaign: +After having conducted the recommended experiments, we can add the newly measured +target information to the campaign: ```python -df["Yield"] = [79.8, 54.1, 59.4] +df["Yield"] = [79.8, 54.1, 59.4] # Measured yields for the three recommended parameter configurations campaign.add_measurements(df) ``` -With the newly arrived data, BayBE can produce a refined design for the next iteration. -This loop would typically continue until a desired target value has been achieved in -the experiment. +With the newly provided data, BayBE can produce a refined recommendation for the next iteration. +This loop typically continues until a desired target value is achieved in the experiment. -### Advanced Example: Chemical Substances -BayBE has several modules to go beyond traditional approaches. One such example is the -use of custom encodings for categorical parameters. Chemical encodings for substances -are a special built-in case of this that comes with BayBE. +### Inspect the Progress of the Experimental Configuration Optimization + +The below plot shows progression of a campaign that optimized direct arylation reaction +by tuning the solvent, base and ligand +(from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)). +Each line shows the best target value that was cumulatively achieved after a given number of experimental iterations. + + +Different lines show outcomes of `Campaigns` with different designs. -In the following picture you can see -the outcome for treating the solvent, base and ligand in a direct arylation reaction -optimization (from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)) with -chemical encodings compared to one-hot and a random baseline: ![Substance Encoding Example](./examples/Backtesting/full_lookup_light.svg) +In particular, the five `Campaigns` differ in how molecules are encoded within each chemical `Parameter`. +We can see that optimization is more efficient when +using chemical encodings (e.g., *MORDRED*) rather than encoding categories with *one-hot* encoding or *random* features. + (installation)= ## ๐Ÿ’ป Installation ### From Package Index @@ -263,7 +310,7 @@ pip install git+https://github.com/emdgroup/baybe.git@main Alternatively, you can install the package from your own local copy. First, clone the repository, navigate to the repository root folder, check out the -desired commit, and run: +desired commit and run: ```bash pip install . diff --git a/docs/_static/api_overview.drawio b/docs/_static/api_overview.drawio new file mode 100644 index 0000000000..0737716ffc --- /dev/null +++ b/docs/_static/api_overview.drawio @@ -0,0 +1,261 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/_static/complex_search_space.drawio b/docs/_static/complex_search_space.drawio new file mode 100644 index 0000000000..fa415f487f --- /dev/null +++ b/docs/_static/complex_search_space.drawio @@ -0,0 +1,58 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_static/complex_search_space_automatic.svg b/docs/_static/complex_search_space_automatic.svg new file mode 100644 index 0000000000..2f0f8f5a77 --- /dev/null +++ b/docs/_static/complex_search_space_automatic.svg @@ -0,0 +1,4 @@ + + + + 2025-10-24T13:49:33.531120 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ \ No newline at end of file diff --git a/docs/_static/complex_search_space_dark.svg b/docs/_static/complex_search_space_dark.svg new file mode 100644 index 0000000000..aaa34141b8 --- /dev/null +++ b/docs/_static/complex_search_space_dark.svg @@ -0,0 +1,4 @@ + + + + 2025-10-24T13:49:33.531120 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ \ No newline at end of file diff --git a/docs/_static/complex_search_space_light.svg b/docs/_static/complex_search_space_light.svg new file mode 100644 index 0000000000..1f195e5a01 --- /dev/null +++ b/docs/_static/complex_search_space_light.svg @@ -0,0 +1,4 @@ + + + + 2025-10-24T13:49:33.531120 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ \ No newline at end of file diff --git a/docs/_static/quick_start.drawio b/docs/_static/quick_start.drawio new file mode 100644 index 0000000000..98567e85db --- /dev/null +++ b/docs/_static/quick_start.drawio @@ -0,0 +1,100 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/_static/quick_start_automatic.svg b/docs/_static/quick_start_automatic.svg new file mode 100644 index 0000000000..666be18904 --- /dev/null +++ b/docs/_static/quick_start_automatic.svg @@ -0,0 +1,4 @@ + + + +
Define parameter searchspace for finding the best possible parameter setting
Objective to be optimized
Add measurements
Getย recommendations for parameter combinations
Measure
\ No newline at end of file diff --git a/docs/_static/quick_start_dark.svg b/docs/_static/quick_start_dark.svg new file mode 100644 index 0000000000..89232f2b64 --- /dev/null +++ b/docs/_static/quick_start_dark.svg @@ -0,0 +1,4 @@ + + + +
Define parameter searchspace for finding the best possible parameter setting
Objective to be optimized
Add measurements
Getย recommendations for parameter combinations
Measure
\ No newline at end of file diff --git a/docs/_static/quick_start_light.svg b/docs/_static/quick_start_light.svg new file mode 100644 index 0000000000..babc27add6 --- /dev/null +++ b/docs/_static/quick_start_light.svg @@ -0,0 +1,4 @@ + + + +
Define parameter searchspace for finding the best possible parameter setting
Objective to be optimized
Add measurements
Getย recommendations for parameter combinations
Measure
\ No newline at end of file diff --git a/docs/faq.md b/docs/faq.md index 1d5deda9e1..6e92f27611 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -24,4 +24,3 @@ your campaign, depending on your settings for the {attr}`~baybe.campaign.Campaign.allow_recommending_already_measured` and {attr}`~baybe.campaign.Campaign.allow_recommending_already_recommended` flags. ``` - diff --git a/docs/userguide/userguide.md b/docs/userguide/userguide.md index 6d50e55322..b0ede87b00 100644 --- a/docs/userguide/userguide.md +++ b/docs/userguide/userguide.md @@ -14,7 +14,7 @@ The most commonly used interface BayBE provides is the central which suggests new measurements and administers the current state of your experimental operation. The diagram below explains how the [`Campaign`](baybe.campaign.Campaign) can be used to perform -the bayesian optimization loop, how it can be configured and +the Bayesian optimization loop, how it can be configured and how the results can be post-analysed. ```{image} ../_static/api_overview_dark.svg