-
Notifications
You must be signed in to change notification settings - Fork 244
Use rdkit
for SSSR and RCs (bug fix + Python upgrade)
#2796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
99bf00c
to
8d6fe64
Compare
Cantera 2.6 isn't available for Python 3.12, so this PR will also need to upgrade the Cantera version to 3 as mostly completed in #2751 |
A note for the path forward on this PR - the |
I ended up moving all the functions in |
Addressed some of the failures, but still a few nontrivial challenges: TODO list:
The tests are just failing mainly for the aromatic compounds, indicating there's some issue with the implementation of the code. Since everything runs, the code "flow" seems clean, just need to iron out the implementation. |
We are now very close to passing all of the unit tests. There are just a couple major bugs to resolve, namely:
|
One of the last failing tests is @rwest, since you're the last remaining dev who was involved in some capacity with this part of the code, do you know why we have that assert statement? I would think it's better to just plot a "dot" than to outright crash out. While we're at it, the |
I don't. I can see that if there is no straight chain backbone that a function to
I haven't investigated the full stack trace. But some safeguard, or exception handling, sounds appropriate.
We can try it. I expect RDKit at the time wasn't being helpful. Hopefully the commit messages offer clues? (this is why we should write helpful commit messages that explain why things are being done). |
Is this related? |
regarding the draw unit test@rwest yes, thanks, #2744 looks relevant. The test that's failing is for this charged species: maybe failing due to the missing X-O connectivity, and so no backbone? Edit: I tried modifying the connectivity s.t. there is an apparent backbone. But, I still run into this problem even after changing the adjacency list to
So it is more fundamentally a problem with the species , than a unit-test specific one. Note about sanitizationFor posterity: some molecules that fail the first kekulization step include:
As implemented in this PR, the |
Regarding the The test case is drawn as a cyclic molecule, even though it's not truly "cyclic". So it should take the However, it thinks that So, how to proceed? a few options...
|
I think using RDKit for drawing as much as possible is fine, if it does a good job. Probably we ran into issues in the past that may not persist with more recent versions of RDKit, so it's fine to revisit past decisions. We should continue to put reasons for things in commit messages, to make life easier for future developers (our future selves included). |
OK. Maybe we can incorporate the fix into #2838 and then rebase onto |
In ReactionMechanismGenerator#2744 and ReactionMechanismGenerator#2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
In ReactionMechanismGenerator#2744 and ReactionMechanismGenerator#2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
7598c98
to
019362c
Compare
In ReactionMechanismGenerator#2744 and ReactionMechanismGenerator#2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
6e6490a
to
e92bec0
Compare
In #2744 and #2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
With the merger of e92bec0 this PR's CI should now pass - let's see! |
@jonwzheng the overnight test failures look spurious - re-running them now. |
This aligns the RDKit conversion process. A relaxed sanitization process is required to avoid kekulization/sanitization/valence issues which would prevent a molecule from being created. Especially relevant in the context of `draw`, which has an RDKit backend that calls this function. We don't want it to fail drawing simple because it doesn't follow the sanitization rules.
In #2744 and #2796 it was found that charge-separated bidentate species can have issues due to ring perception conflicts. The previous implementation also by default did not use the rdkit backend for charged species, but this was decided many years ago (~10 years!) In the meantime, RDKit conformer generation has improved and likely this we can just use RDKit by default, which would avoid the pesky edge-case issues for ions/zwitterions. In case the old behavior is desired, use_rdkit can be set to False.
Accompanies changes to `draw.py` to use `rdkit` backend, which traditionally was not well-supported for ions (but now might be a better option than the default drawing algorithm).
The molecule to_rdkit_mol now allows for and calls sanitize. The fragment code previously had hardcoded args. This commit just makes the args flexible so that they get passed directly to `converter` regardless of what the arguments are.
After changing to_rdkit_mol to kwargs format in Fragment, some of the existing code that relied on the previous function defaults broke. Namely, return_mapping must be True.
This reverts commit 7ac5fd4.
Fragments will sometimes call `get_smallest_set_of_smallest_rings` (e.g. for drawing), which will then call the _fragment_ version of `to_rdkit_mol` (rather than Molecule, since Fragment inherits from Molecule), which returns a _tuple_ rather than a _mol_. This causes a crash. I considerd just replacing this with `converter.to_rdkit_mol` without the checks, but then you'd lose out on any fragment-related benefits from to_rdkit_mol (for example, you need to replace the fragments with H atoms). This commit also adds a check so that the user is at least aware that the default behavior is to change the kwarg to forcibly return mapping=True for fragments.
5ae2180
to
d9e6fef
Compare
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:00:50 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Failed Edge Comparison ❌Original model has 106 species. Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(124cyclohexatriene) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)CsHH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(1,4-Cyclohexadiene) + ring(1,3-Cyclohexadiene) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds- CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsHH) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(1,4-Cyclohexadiene) + ring(1,3-Cyclohexadiene) + radical(Cds_P) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds- CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsHH) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(1,4-Cyclohexadiene) + ring(1,3-Cyclohexadiene) Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics:
Observables Test Case: Aromatics Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:01:58 liquid_oxidation Passed Core Comparison ✅Original model has 37 species. liquid_oxidation Failed Edge Comparison ❌Original model has 214 species.
Observables Test Case: liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:03 nitrogen Passed Core Comparison ✅Original model has 41 species. nitrogen Passed Edge Comparison ✅Original model has 133 species.
Observables Test Case: NC Comparison
✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:01:49 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species.
Observables Test Case: Oxidation Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Errors occurred during observable testing
WARNING:root:Initial mole fractions do not sum to one; normalizing.
|
For testing, I made a new conda environment, following default instructions, and was pleased to see it chose Python 3.11.13. I ran the
....
I guess this shouldn't be happening? |
Yeah, these result from the new changes. They're not errors per se, but right now the implementation from this PR can lead to a lot of these logging messages. Basically, we need Molecules with resonance hybrids can have bond orders "averaged" out, leading to values like 1.4 or 1.6 that don't map onto a recognized type of RMG bond order. Leading to those errors you're getting. The fallback I've implemented in this PR is to have any unrecognized bond order map onto an We never got these logging warnings before because we never had to make an RDKit molecule (which probes the bond orders, and triggers these warnings), but now, by default, an RDKit molecule is created to assess ring connectivity. Not sure about best path forward. One idea I just had is maybe we can see if there's a way to do the ring perception stuff before any bond averaging is done so this doesn't pop up. That might be easy or might require a rewrite of the logic, we'd have to check. |
@jonwzheng IMO it seems like the cause of these warnings is well understood, so it would be fair to simply catch and hide them. |
Okay, I figured out the direct cause of the bond order errors. I think this warrants some discussion before we proceed. In Later in that function, The question is, how should ring perception work for these "hybrid" structures? |
I think you're right: since RDKit is just being used for ring perception, it doesn't matter really what we set the bond type to. So the current implementation in this PR (to treat weird bond orders as Also, even though it seems OK for ring perception, we should be aware that there's a risk of error if we want to do more complicated RDKit operations later on down the road. |
Some compounds with resonance form resonance hybrids, which create non-integer bond orders that then call ring perception (via get_symmetry_number). Because non-integer bond orders are not recognized, we handle them as `unspecified`. Alternatively, the kekulization rules for RMG may sometimes differ from those of RDKit, which also logged a warning. For ring perception, these 'warnings' do not impact performance, and for nearly all users should not raise any concerns. So this demotes the logging level from `warning` to `debug`
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:00:49 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Failed Edge Comparison ❌Original model has 106 species. Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(124cyclohexatriene) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)CsHH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(1,4-Cyclohexadiene) + ring(1,3-Cyclohexadiene) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds- CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsHH) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(1,4-Cyclohexadiene) + ring(1,3-Cyclohexadiene) + radical(Cds_P) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds- CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsHH) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(1,4-Cyclohexadiene) + ring(1,3-Cyclohexadiene) Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics:
Observables Test Case: Aromatics Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:01:55 liquid_oxidation Passed Core Comparison ✅Original model has 37 species. liquid_oxidation Failed Edge Comparison ❌Original model has 214 species. Non-identical kinetics! ❌
kinetics:
Observables Test Case: liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:02 nitrogen Passed Core Comparison ✅Original model has 41 species. nitrogen Passed Edge Comparison ✅Original model has 133 species.
Observables Test Case: NC Comparison
✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:01:47 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species.
Observables Test Case: Oxidation Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Errors occurred during observable testing
WARNING:root:Initial mole fractions do not sum to one; normalizing.
|
@jonwzheng nice, this looks good! @rwest if this is agreeable I think we should merge |
Currently we use
RingDecomposerLib
for finding the Smallest Set of Smallest Rings and getting the Relevant Cycles. This package does not support Python 3.10+ and is thus blocking further upgrades to RMG.@KnathanM in particular is looking to get RMG to Python 3.11 so as to add support for ChemProp v2.
I believe we can just use RDKit to do these operations instead. The original paper mentions that the functionality was being moved upstream to RDKit. With the help of AI I've taken just a first pass at reimplementing, with the special note that:
get_deterministic_sssr
is not really deterministic #2562This PR will be a draft for now, as it is predicated on Python 3.9 already being available (which it nearly is in #2741)
Motivation or Problem
A clear and concise description of what what you're trying to fix or improve. Please reference any issues that this addresses.
Description of Changes
A clear and concise description of what what you've changed or added.
Testing
A clear and concise description of testing that you've done or plan to do.
Reviewer Tips
Suggestions for verifying that this PR works or other notes for the reviewer.