Skip to content

Conversation

@Hrovatin
Copy link
Collaborator

@Hrovatin Hrovatin commented Oct 23, 2025

This PR tackles some points from #598, focused on #598 (comment) as agreed on the team meeting

Link to the built documentation
I still have issues with displaying images, will investigate this.

@Hrovatin Hrovatin marked this pull request as draft October 23, 2025 14:41
@Hrovatin Hrovatin marked this pull request as ready for review October 24, 2025 14:42
@Scienfitz

This comment was marked as outdated.

the bayesian optimization loop, how it can be configured and
the Bayesian optimization loop, how it can be configured and
how the results can be post-analysed.

Copy link
Collaborator Author

@Hrovatin Hrovatin Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from before:

  1. I found the description of parameters as "individual degrees of freedom" a bit too technical. I am not sure if this term is familiar for all users and would suggest to name it "variables that can be adjusted during an experiment" or something similar.
image
  1. Descriptive vs active language
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdrianSosic @AVHopp @Scienfitz Let me know if I should fix any of these as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with both descriptive and active language as long as it is consistent. Agree with the point of the description for parameters being too technical. Maybe one could use something describing that the parameters are the things controlling the experiments and that can be changed by the experimentalist?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for Parameters:
"Parameters to be optimized and other data covariates."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you sue the word covariates you will make it even more technical than the original wording

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be "Parameters to be optimized and additional dataset information" better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "additional dataset information", what do you mean by that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree thats imo more confusing than the original as well

why not simply individual settings you can control during an experiment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With additional dataset information I meant like task type (which is not optimized per se). But I can simplify.
@Scienfitz your sounds ok, but would replace "settings" with "variables" or so?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for consistency:

  • If you go with Martin's suggestion (regardless of settings or variables), let's not add a you
  • Let's also resolve the other inconsistencies you pointed out above

@AVHopp
Copy link
Collaborator

AVHopp commented Oct 28, 2025

@Hrovatin please have a compiled version of this on your fork and link it here (ideally in the PR description) since this makes it way easier to review. If there is trouble with deploying things on your fork just ping me and I can help with that :)

@AVHopp
Copy link
Collaborator

AVHopp commented Oct 28, 2025

Also, can you check where you branched off for this PR? Just switched the branch to view the README on github and this seems to be quite outdated
image

Copy link
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First few comments :)


<div align="center">

[![Complex Search Space](https://raw.githubusercontent.com/Hrovatin/baybe/docs/easy_access/docs/_static/complex_search_space_automatic.svg)](https://github.com/emdgroup/baybe/)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This requires mini PR of uploading images before merge and updating the link

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context from my end as we investigated this: The issue is that Sphinx has problems trying to load the static file directly from the repo here: You can either give it a path such that it finds it in the README on github but not in the compiled version or vice versa. Hence the fix to link to the static image on github itself.


<div align="center">

[![Quick Start](https://raw.githubusercontent.com/Hrovatin/baybe/docs/easy_access/docs/_static/quick_start_automatic.svg)](https://github.com/emdgroup/baybe/)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This requires mini PR of uploading images before merge and updating the link

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context from my end as we investigated this: The issue is that Sphinx has problems trying to load the static file directly from the repo here: You can either give it a path such that it finds it in the README on github but not in the compiled version or vice versa. Hence the fix to link to the static image on github itself.

Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Hrovatin, thanks for taking the lead here. I'm welcoming improvements to the README 👍🏼 But we have to make sure that this is not at the cost of:

  • introducing inconsistencies
  • a lack of precision, i.e. softening the language in such a way that statements become inexact

To show you what I mean, I've added the hashtags #consistency and #rigor to many of my comments. This is only a first round anyway, more comments to come, but I'll now first go over the other existing open threads 🙃

README.md Outdated
- 📈 Comprehensive backtest, simulation and imputation utilities: Benchmark and find your best settings
- 📝 Fully typed and hypothesis-tested: Robust code base
- 🔄 All objects are fully (de-)serializable: Useful for storing results in databases or use in wrappers like APIs
- 📚 Leverage **domain knowledge**.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#consistency

Maybe this is just me, but: I personally have the impression that a nested list of features is visually actually much more overwhelming than the flat list we had before.

Before we had:

  • one line per bullet (there was one exception that should have been fixe)
  • Same style per bullet. First: named feature. Second: short explanation

Now we have:

  • Mixed multi-level bullets (sometimes one entry, sometimes with headings)
  • Drastically different lengths of the features descriptions
  • Completely mixed styles. Sometimes, there is a feature name before a colon, sometimes there is no colon and sometimes there is not even a feature name. Sometimes, there is a bold section, sometimes not. Sometimes, the style is imperative mood (e.g "Encode ...") and sometimes it's just listing facts

To be honest, I don't understand how this is now "simpler" than what used to be there before (not talking about content/wording, but just about formatting) 😅 What are your thoughts on this, @AVHopp @Scienfitz

If all of you prefer the nested style, I'm fine with it. But can we then at least fix the other inconsistencies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I always aimed to start with why and then optionally go to how. I do not think that the name of the feature is the first thing to read as that may be technical, but rather the purpuse that everyone be able to see the relevance of. I realize now that a few are not worded that way so I made push that unifies that part.
  2. I like nested format as it groups things a bit instead of having a laundry list of everything together. But if the groupping does not make sense we can re-discuss.
  3. I added bolding to all first-level titles, but I can also add bold parts to others.
  4. I made all imperative as this is closer to what was before.
  5. I personally do not see somewhat different line lengths as major issue, but if all other see it then we can re format.

README.md Outdated
- 📚 Leverage **domain knowledge**.
- 🎨 Encode categorical data to capture relationships between categories. BayBE also provides built-in chemical encodings.
- 🛠️ Option to build-in mechanistic process understanding via custom surrogate models.
- 🏛️ Leverage **historic data** to accelerate optimization via transfer learning.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#rigor

I think this is potentially very confusing / misses the point. All of Bayesian optimization is built upon the idea of using "historic" data, if you will. The term simply isn't well-enough specified here to make the point. The important part is that the data needs to come from slightly different campaigns (message of the old bullet) and not just represent "historic stuff" that you've collected earlier on with the same setup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Leverage **historic data** from similar campaigns to accelerate optimization via transfer learning.


</div>

From the user perspective, the most important part is the "design" step.
Copy link
Collaborator

@AdrianSosic AdrianSosic Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#rigor

Perhaps no one of you noticed, but I still think this is a very critical points since highly misleading. With "design", you refer to the step where the user defines the components of the the campaign etc. However, please note that we're fully in the DOE context here, where "design" has a very different meaning – it's literally in the name of the approach. In (Bayesian) DOE, the "design" refers to the entire end-to-end process, i.e. it also includes (and one could argue this is actually the main part of it!) the actual placement of the experiments in the search space according to some criterion – so that would correspond not the parameter definition step but actually to the recommendation step.

Thus, I think it's crucial that we adjust the terminology here (and also in the "design" user guide)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call it setup step or similar to avoid the word design

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdrianSosic would "setup" be ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Here we have again the "parameter combination" term (see other thread)
  • Analogously, maybe we should make it Measure targets below
  • I really don't like the icon for the uppermost layer – it's way too complicated / cluttered for this purpose. Isn't there something simpler? E.g. a pen with a "clean" paper, or maybe just a pen, or similar
  • I know it's a bit pedantic, and hopefully we'll soon be able to replace it with a proper logo anyway, but can you make the boundaries around the BayBE part a tiny little broader (not by much). But it really looks like a bad screeenshot :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do all of them at the end. @Scienfitz @AVHopp Please also let me know if you have any requests fro this plot before I change it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the color (orange and blue) of the arrows intentional? seems strange to me I would just make them neutral

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants