-
Notifications
You must be signed in to change notification settings - Fork 59
Improve docs accessibility #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3e67eb0
24559ad
9785796
6a4cdb7
22718e0
2fa2190
6d9baf3
e74be96
beb9b9c
d58148a
93b2e81
6f6f4ac
c09a626
1832a47
0e293b2
0da883e
e963d6b
fddcdf0
3b15705
f86a1c8
299ade3
0fa28d5
264648e
cbc3aa3
cd17c86
8f3f34f
c6cf29d
5584eb1
75718b5
b1aaa0e
60c8f0e
af543f9
08ee66b
8c3b9f3
cf01ac2
4f17ccd
2ddcf59
16588c3
6cf645c
186e6b4
12318cf
9b1c809
522ee1d
7ad39d1
451c99b
7bdd5a5
1d13bf0
b2d38c9
e81c2c2
eb701cf
07d4f98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,35 +27,77 @@ | |
|
|
||
| # BayBE — A Bayesian Back End for Design of Experiments | ||
|
|
||
| The **Bay**esian **B**ack **E**nd (**BayBE**) is a general-purpose toolbox for Bayesian Design | ||
| of Experiments, focusing on additions that enable real-world experimental campaigns. | ||
| The **Bay**esian **B**ack **E**nd (**BayBE**) helps to find **good parameter configurations** | ||
| within complex parameter search spaces. | ||
|
|
||
| <div align="center"> | ||
|
|
||
| [](https://github.com/emdgroup/baybe/) | ||
|
|
||
| </div> | ||
|
|
||
| Example use cases: | ||
|
|
||
| - 🧪 Find chemical reaction conditions or process parameters | ||
| - 🥣 Create materials, chemical mixtures or formulations with desired properties | ||
| - ✈️ Optimize the 3D shape of a physical object | ||
| - 🖥️ Optimize a virtual simulation | ||
| - ⚙️ Select model hyperparameters | ||
| - 🫖 Find tasty espresso machine settings | ||
Hrovatin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| This is achieved via **Bayesian Design of Experiments**, | ||
| which helps to efficiently navigate parameter search spaces. | ||
| It balances | ||
| exploitation of parameter space regions known to lead to good outcomes | ||
| and exploration of unknown regions. | ||
|
|
||
| BayBE provides a **general-purpose toolbox** for Bayesian Design of Experiments, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this and the sentence directly after the next heading feel very repetitive. Would combine or do something to avoid that
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Current: Proposed (merge and remove the batteries included heading, having is as part of general intro):
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Scienfitz any opinion?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @AdrianSosic was very keen on adding the battery phrase so since you want to remove it I think you both have to sync
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not married to it, but I still think it sends a very clear message that this package cares deeply about usability – which is in fact the main purpose of it (otherwise, people could just use botorch). Does it hurt to keep it? |
||
| focusing on making this procedure easily accessible for real-world experiments. | ||
|
|
||
|
|
||
| ## 🔋 Batteries Included | ||
| Besides its core functionality to perform a typical recommend-measure loop, BayBE | ||
| offers a range of ✨**built‑in features**✨ crucial for real-world use cases. | ||
| BayBE offers a range of ✨**built‑in features**✨ crucial for real-world use cases. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would not delete the
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted to shorten where I felt that text may be redundant. But I can add back if prefered @Scienfitz See proposal in: #676 (comment) |
||
| The following provides a non-comprehensive overview: | ||
|
|
||
| - 🛠️ Custom parameter encodings: Improve your campaign with domain knowledge | ||
| - 🧪 Built-in chemical encodings: Improve your campaign with chemical knowledge | ||
| - 🎯 Numerical and binary targets with min, max and match objectives | ||
| - ⚖️ Multi-target support via Pareto optimization and desirability scalarization | ||
| - 🔍 Insights: Easily analyze feature importance and model behavior | ||
| - 🎭 Hybrid (mixed continuous and discrete) spaces | ||
| - 🚀 Transfer learning: Mix data from multiple campaigns and accelerate optimization | ||
| - 🎰 Bandit models: Efficiently find the best among many options in noisy environments (e.g. A/B Testing) | ||
| - 🔢 Cardinality constraints: Control the number of active factors in your design | ||
| - 🌎 Distributed workflows: Run campaigns asynchronously with pending experiments and partial measurements | ||
| - 🎓 Active learning: Perform smart data acquisition campaigns | ||
| - ⚙️ Custom surrogate models: Enhance your predictions through mechanistic understanding | ||
| - 📈 Comprehensive backtest, simulation and imputation utilities: Benchmark and find your best settings | ||
| - 📝 Fully typed and hypothesis-tested: Robust code base | ||
| - 🔄 All objects are fully (de-)serializable: Useful for storing results in databases or use in wrappers like APIs | ||
| - 📚 Leverage **domain knowledge**: | ||
| - 🎨 Capture relationships between categories by encoding categorical data. BayBE also provides built-in chemical encodings. | ||
| - 🛠️ Build in mechanistic process understanding via custom surrogate models. | ||
| - 🏛️ Leverage **historic data** from similar campaigns to accelerate optimization via transfer learning. | ||
| - 🌀 **Flexibly** define target outcomes, parameter search spaces and optimization strategies: | ||
| - 🎯 Choose between numerical targets (e.g., experimental outcome values) or binary targets (e.g., good/bad classification of experimental results). Targets can be minimized, maximized or matched to a specific value. | ||
| - 👥👥 Optimized multiple targets at once (e.g., via Pareto optimization or desirability scalarization). | ||
| - 🎭 Use both continuous and discrete parameters can within a single search space. | ||
| - 🔢 Define a maximal number of mixture components via cardinality constraints. | ||
| - ⚖️ Choose between different optimization strategies to balance exploration and exploitation of the search space: | ||
| - 🌍 Gain the understanding of the whole search space via active learning. | ||
| - 🎰 Maximize total gain across a sequence of actions via bandit models. | ||
| - 🌐 Run campaigns **asynchronously** with pending experiments and partial measurements. | ||
| - 🔍 Gain **insights** about the optimization campaigns by analyzing feature importance and model behavior. | ||
| - 📈 Conduct **benchmarks** to select between different Bayesian optimization settings via backtesting. | ||
| - 🔄 Connect BayBE with **database storage and API wrappers** using the serialization functionality. | ||
| - 📝 Rely on **high-quality code base** with comprehensive tests and typing. | ||
|
|
||
|
|
||
| ## ⚡ Quick Start | ||
|
|
||
| Let us consider a simple experiment where we control three parameters and want to | ||
| maximize a single target called `Yield`. | ||
| To perform Bayesian Design of Experiments with BayBE, | ||
| you should first specify the **parameter search space** and **objective** to be optimized. | ||
| Based on this information and any **available data** about outcomes of specific parameter configurations, | ||
| BayBE will **recommend the next set of parameter configurations** to be **measured**. | ||
| To inform the next recommendation cycle, the newly generated measurements can be added to BayBE. | ||
|
|
||
| <div align="center"> | ||
|
|
||
| [](https://github.com/emdgroup/baybe/) | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: This requires mini PR of uploading images before merge and updating the link
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some context from my end as we investigated this: The issue is that Sphinx has problems trying to load the static file directly from the repo here: You can either give it a path such that it finds it in the README on github but not in the compiled version or vice versa. Hence the fix to link to the static image on github itself. |
||
|
|
||
| </div> | ||
|
|
||
| From the user perspective, the most important part is the "design" step. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #rigor Perhaps no one of you noticed, but I still think this is a very critical points since highly misleading. With "design", you refer to the step where the user defines the components of the the campaign etc. However, please note that we're fully in the DOE context here, where "design" has a very different meaning – it's literally in the name of the approach. In (Bayesian) DOE, the "design" refers to the entire end-to-end process, i.e. it also includes (and one could argue this is actually the main part of it!) the actual placement of the experiments in the search space according to some criterion – so that would correspond not the parameter definition step but actually to the recommendation step. Thus, I think it's crucial that we adjust the terminology here (and also in the "design" user guide)
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just call it setup step or similar to avoid the word design
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AdrianSosic would "setup" be ok?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, definitely better than "design" 👍🏼 |
||
|
|
||
| Below we show a simple optimization procedure, starting with the design step and subsequently | ||
| performing the recommendation loop. | ||
| The provided example aims to maximize the yield of a chemical reaction by adjusting its parameter configurations | ||
| (also known as reaction conditions). | ||
|
|
||
| First, install BayBE into your Python environment: | ||
| ```bash | ||
|
|
@@ -66,7 +108,7 @@ For more information on this step, see our | |
|
|
||
| ### Defining the Optimization Objective | ||
|
|
||
| In BayBE's language, the `Yield` can be represented as a `NumericalTarget`, | ||
| In BayBE's language, the reaction yield can be represented as a `NumericalTarget`, | ||
AVHopp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| which we wrap into a `SingleTargetObjective`: | ||
|
|
||
| ```python | ||
|
|
@@ -76,21 +118,19 @@ from baybe.objectives import SingleTargetObjective | |
| target = NumericalTarget(name="Yield") | ||
| objective = SingleTargetObjective(target=target) | ||
| ``` | ||
| In cases where we are confronted with multiple (potentially conflicting) targets, | ||
| the `ParetoObjective` or `DesirabilityObjective` can be used instead. | ||
| These allow to define additional settings, such as how the targets should be balanced. | ||
| In cases where we are confronted with multiple (potentially conflicting) targets | ||
| (e.g., yield vs cost), | ||
| the `ParetoObjective` or `DesirabilityObjective` can be used to define how the targets should be balanced. | ||
| For more details, see the | ||
| [objectives section](https://emdgroup.github.io/baybe/stable/userguide/objectives.html) | ||
| of the user guide. | ||
|
|
||
| ### Defining the Search Space | ||
|
|
||
| Next, we inform BayBE about the available "control knobs", that is, the underlying | ||
| system parameters we can tune to optimize our targets. This also involves specifying | ||
| their values/ranges and other parameter-specific details. | ||
|
|
||
| For our example, we assume that we can control three parameters – `Granularity`, | ||
| `Pressure[bar]`, and `Solvent` – as follows: | ||
| reaction parameters we can tune to optimize the yield. | ||
| In this case we tune granularity, pressure and solvent, each being encoded as a `Parameter`. | ||
| We also need to specify which values individual parameters can take. | ||
|
|
||
| ```python | ||
| from baybe.parameters import ( | ||
|
|
@@ -147,20 +187,15 @@ and alternative ways of construction. | |
|
|
||
| ### Optional: Defining the Optimization Strategy | ||
|
|
||
| As an optional step, we can specify details on how the optimization should be | ||
| conducted. If omitted, BayBE will choose a default setting. | ||
| As an optional step, we can specify details on how the optimization of the experimental configurations should be | ||
| performed. If omitted, BayBE will choose a default Bayesian optimization setting. | ||
|
|
||
| For our example, we combine two recommenders via a so-called meta recommender named | ||
| `TwoPhaseMetaRecommender`: | ||
|
|
||
| 1. In cases where no measurements have been made prior to the interaction with BayBE, | ||
| a selection via `initial_recommender` is used. | ||
| 2. As soon as the first measurements are available, we switch to `recommender`. | ||
|
|
||
| For more details on the different recommenders, their underlying algorithmic | ||
| details, and their configuration settings, see the | ||
| [recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html) | ||
| of the user guide. | ||
| the parameters will be recommended with the `initial_recommender`. | ||
| 2. As soon as the first measurements are available, we switch to the `recommender`. | ||
|
|
||
| ```python | ||
| from baybe.recommenders import ( | ||
|
|
@@ -175,65 +210,77 @@ recommender = TwoPhaseMetaRecommender( | |
| ) | ||
| ``` | ||
|
|
||
| For more details on the different recommenders, their underlying algorithmic | ||
| details and how their settings can be adjusted, see the | ||
| [recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html) | ||
| of the user guide. | ||
|
|
||
| ### The Optimization Loop | ||
|
|
||
| We can now construct a campaign object that brings all pieces of the puzzle together: | ||
| We can now construct a `Campaign` that performs the Bayesian optimization of the experimental configurations: | ||
|
|
||
| ```python | ||
| from baybe import Campaign | ||
|
|
||
| campaign = Campaign(searchspace, objective, recommender) | ||
| ``` | ||
|
|
||
| With this object at hand, we can start our experimentation cycle. | ||
| With this object at hand, we can start our optimization cycle. | ||
| In particular: | ||
|
|
||
| * We can ask BayBE to `recommend` new experiments. | ||
| * We can `add_measurements` for certain experimental settings to the campaign's | ||
| database. | ||
| * The campaign can `recommend` new experiments. | ||
| * We can `add_measurements` of target values for the measured parameter configurations | ||
| to the campaign's database. | ||
|
|
||
| Note that these two steps can be performed in any order. | ||
| In particular, available measurements can be submitted at any time and also several | ||
| times before querying the next recommendations. | ||
|
|
||
| ```python | ||
| df = campaign.recommend(batch_size=3) | ||
| df = campaign.recommend(batch_size=3) # Recommend three experimental configurations to test | ||
| print(df) | ||
| ``` | ||
|
|
||
| The below table shows the three parameter configurations for which BayBE recommended to | ||
| measure the reaction yield. | ||
|
|
||
| Note that the specific recommendations will depend on both the data | ||
| already fed to the campaign and the random number generator seed that is used. | ||
|
|
||
| ```none | ||
| Granularity Pressure[bar] Solvent | ||
| 15 medium 1.0 Solvent D | ||
| 10 coarse 10.0 Solvent C | ||
| 29 fine 5.0 Solvent B | ||
| ``` | ||
|
|
||
| Note that the specific recommendations will depend on both the data | ||
| already fed to the campaign and the random number generator seed that is used. | ||
|
|
||
| After having conducted the corresponding experiments, we can add our measured | ||
| targets to the table and feed it back to the campaign: | ||
| After having conducted the recommended experiments, we can add the newly measured | ||
| target information to the campaign: | ||
|
|
||
| ```python | ||
| df["Yield"] = [79.8, 54.1, 59.4] | ||
| df["Yield"] = [79.8, 54.1, 59.4] # Measured yields for the three recommended parameter configurations | ||
| campaign.add_measurements(df) | ||
| ``` | ||
|
|
||
| With the newly arrived data, BayBE can produce a refined design for the next iteration. | ||
| This loop would typically continue until a desired target value has been achieved in | ||
| the experiment. | ||
| With the newly provided data, BayBE can produce a refined recommendation for the next iteration. | ||
| This loop typically continues until a desired target value is achieved in the experiment. | ||
|
|
||
| ### Advanced Example: Chemical Substances | ||
| BayBE has several modules to go beyond traditional approaches. One such example is the | ||
| use of custom encodings for categorical parameters. Chemical encodings for substances | ||
| are a special built-in case of this that comes with BayBE. | ||
| ### Inspect the Progress of the Experimental Configuration Optimization | ||
|
|
||
| The below plot shows progression of a campaign that optimized direct arylation reaction | ||
| by tuning the solvent, base and ligand | ||
| (from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)). | ||
| Each line shows the best target value that was cumulatively achieved after a given number of experimental iterations. | ||
|
|
||
|
|
||
| Different lines show outcomes of `Campaigns` with different designs. | ||
|
|
||
| In the following picture you can see | ||
| the outcome for treating the solvent, base and ligand in a direct arylation reaction | ||
| optimization (from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)) with | ||
| chemical encodings compared to one-hot and a random baseline: | ||
|  | ||
|
|
||
| In particular, the five `Campaigns` differ in how molecules are encoded within each chemical `Parameter`. | ||
| We can see that optimization is more efficient when | ||
| using chemical encodings (e.g., *MORDRED*) rather than encoding categories with *one-hot* encoding or *random* features. | ||
|
|
||
Hrovatin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| (installation)= | ||
| ## 💻 Installation | ||
| ### From Package Index | ||
|
|
@@ -263,7 +310,7 @@ pip install git+https://github.com/emdgroup/baybe.git@main | |
|
|
||
| Alternatively, you can install the package from your own local copy. | ||
| First, clone the repository, navigate to the repository root folder, check out the | ||
| desired commit, and run: | ||
| desired commit and run: | ||
|
|
||
| ```bash | ||
| pip install . | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This requires mini PR of uploading images before merge and updating the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some context from my end as we investigated this: The issue is that Sphinx has problems trying to load the static file directly from the repo here: You can either give it a path such that it finds it in the README on github but not in the compiled version or vice versa. Hence the fix to link to the static image on github itself.