Skip to content

Extract data on harmonic bonds #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 16, 2025
Merged

Extract data on harmonic bonds #73

merged 3 commits into from
May 16, 2025

Conversation

timholy
Copy link
Collaborator

@timholy timholy commented May 16, 2025

This extracts more data from ff14SB.

I confess I don't understand the atom notations: what are, e.g., CX, CT, 2C, C*, etc? I'm guessing some of these are particular carbons, which accounts for why there doesn't seem to be a N-Cα bond listed. Is there special significance to the protein- part of the label? EDIT: oh, wait, the type-tag appears in the residue data...I still wouldn't mind understanding the designation scheme, but that resolves the immediate concern. Are there special rules at the N- and C-termini?

Also, is there a rule for key-ordering? In most cases it seems alphabetical, but then you get down to ("protein-H1", "protein-2C") and that goes out the window. It would be nice to know if there is a rule so one doesn't have to try both orders.

More data from ff14SB
Copy link

codecov bot commented May 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.32%. Comparing base (a507527) to head (35c1ac5).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #73   +/-   ##
=======================================
  Coverage   95.32%   95.32%           
=======================================
  Files          14       14           
  Lines        2033     2033           
=======================================
  Hits         1938     1938           
  Misses         95       95           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jgreener64
Copy link
Member

I confess I don't understand the atom notations: what are, e.g., CX, CT, 2C, C*, etc?

This is the rather complex topic of atom types and classes, see http://docs.openmm.org/latest/userguide/application/06_creating_ffs.html#atomtypes for example. Carbons in similar chemical environments, such as generic tetrahedral carbons (CT), are given the same atom type and consequently the same Lennard-Jones parameters.

Is there special significance to the protein- part of the label?

Not sure.

It would be nice to know if there is a rule so one doesn't have to try both orders.

As I recall OpenMM allows both orders in its parsing.

What is the intended use case of having the bond data in BioStructures? I ask because the values here are associated specifically with this force field, compared to the bonding topology in #66 which is a property of the amino acids themselves.

Maybe we could have the parsing machinery but not check the values into source for data specific to a force field.

@timholy
Copy link
Collaborator Author

timholy commented May 16, 2025

Thanks for the explanation!

I agree with your reservations about whether we want all this. I'd like the bond lengths and angles to support structure generation, but I don't have a use for the spring constants. Do you regard the lengths and angles as reasonably "intrinsic," or do they too make sense only in the context of a specific force field? If they are reasonably intrinsic, would it make sense to drop the spring constants but keep the rest?

@jgreener64
Copy link
Member

Ah I see, yes the bond lengths and angles would be useful for structure building. These are probably standard enough to use from Amber and have in source.

I always liked the https://github.com/clauswilke/PeptideBuilder library for this task, might be worth seeing where they source their bonds and angles.

@timholy
Copy link
Collaborator Author

timholy commented May 16, 2025

There isn't an immediately obvious attribution: https://github.com/clauswilke/PeptideBuilder/blob/6d38a167b9992c27adc86f64370f7083303ce877/PeptideBuilder/Geometry.py#L68-L91

Since you mentioned an interest in structure generation, here's the docstring of the function I'm writing now (warning: in progress, subject to change):

"""
    iterate_bonds(f, seq)

Given a peptide/protein sequence using the amino acid names in
`BioStructures.residuedata`, iterate over all bonds between atom-pairs,
executing the function `f` on each. `f` should accept inputs of the form

function f(atomfrom::AtomInfo, atomto::AtomInfo, bondlength::Float32, atomfrom_nbrs::Vector{Pair{AtomInfo,Float32})
    # do something
end

where `atomto` is the "new" atom, `atomfrom` is an "old"/already-encountered
atom (except the initial `N` at the start of the sequence), `bondlength` is the
length (in Å) of the bond between `atomfrom` and `atomto`, and `atomfrom_nbrs`
is a vector of pairs of "old" atoms and the bond angle between
`nbr`-`atomfrom`-`atomto`. Because all entries in `atomfrom_nbrs` are "old", it
may be an incomplete list of all the bonding partners of `atomfrom`.


# Examples

julia> iterate_bonds(["MET", "ALA"]) do atomfrom, atomto, bondlength, atomfrom_nbrs
           println("Bond from $(atomfrom.atomname) to $(atomto.atomname) with length $bondlength")
           println("Neighbors of $(atomfrom.atomname):")
           for (nbr, angle) in atomfrom_nbrs
               println("  - $(nbr.atomname) at angle $angle")
           end
       end
"""

Currently that's intended to go in a lab package, but if you think it's of general interest I could put it here instead. It's a different interface than PeptideBuilder (I don't want to hand-specify the dihedral angles), so maybe it's not of widespread interest.

@timholy
Copy link
Collaborator Author

timholy commented May 16, 2025

Are you happy having lengths in nanometers or would you prefer Angstroms? Angles in radians or degrees?

@jgreener64
Copy link
Member

My preference is Angstrom for compat with our coords function and radians for compat with our angle functions.

@timholy timholy merged commit 49dac0c into master May 16, 2025
10 checks passed
@timholy timholy deleted the teh/harmbonds branch May 16, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants