Skip to content

Improve read perf, store coords as SVector #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

timholy
Copy link
Collaborator

@timholy timholy commented May 8, 2025

This substantially reduces the amount of memory used to store
coordinates and parse files. While parsing a large mmCIF file, the
memory usage dropped by approximately 50%, and read time by 15%.
number of allocations was cut by a factor of 3 and the read time dropped 40%.

The most potentially-disruptive change is that the coordinates are now
stored as SVectors instead of Vectors. This means that the coordinates
are now immutable, and you cannot change them in place by manual
indexing. The x!, y!, and z! functions still work, as do in-place
transformations, by making the coords field itself mutable.

Closes #55

This substantially reduces the amount of memory used to store
coordinates and parse files. While parsing a large mmCIF file, the
memory usage dropped by approximately 50%, and read time by ~15%.

The most potentially-disruptive change is that the coordinates are now
stored as SVectors instead of Vectors. This means that the coordinates
are now immutable, and you cannot change them in place by manual
indexing. The `x!`, `y!`, and `z!` functions still work, as do in-place
transformations, by making the `coords` field itself mutable.
@timholy
Copy link
Collaborator Author

timholy commented May 8, 2025

Inspired in part by seeing the activity in #45

Copy link

codecov bot commented May 8, 2025

Codecov Report

Attention: Patch coverage is 97.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.34%. Comparing base (27e02c7) to head (0928da7).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/mmcif.jl 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
+ Coverage   95.32%   95.34%   +0.01%     
==========================================
  Files          14       14              
  Lines        2033     2039       +6     
==========================================
+ Hits         1938     1944       +6     
  Misses         95       95              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jgreener64
Copy link
Member

Seems like a sensible change to me. I'm surprised how small the diff required to do this is.

Can we get away with this being a non-breaking change? The return type of coords has changed, but is still array-like.

If it has to be breaking, I would be tempted to change from Float64 to Float32 coordinates in the same release.

@timholy
Copy link
Collaborator Author

timholy commented May 8, 2025

I agree this is a tricky issue. Fundamentally I'm not sure. Plain type-changes often aren't, but mutable to immutable is a behavior change.

If we decide it's breaking, then we might also evaluate whether there are any savings to be had by reordering some of the struct fields. Julia requires 8-byte alignment for certain field-types, so sizeof(T) can depend on the order of T's fields. That would definitely be a breaking change, since it affects constructors, and we'd only want to do that if the savings seemed worth the chaos.

@timholy
Copy link
Collaborator Author

timholy commented May 8, 2025

If we think it's non-breaking, should we get out a 4.5.0 release before merging it? If we go ahead with the notion that this is non-breaking, it feels like a release that would at least raise the risk that we might later decide to yank it.

@jgreener64
Copy link
Member

It's probably safer to consider it breaking, and think about reordering struct fields and moving to Float32 in the same release. I can look at the Float32 switch in the next few weeks unless someone else gets there first.

Either way I will release v4.5.0 shortly.

@timholy
Copy link
Collaborator Author

timholy commented May 10, 2025

unless someone else gets there first.

Beat ya 🙂

Not pushed yet, but here's the current state:

julia> c
Chain DA with 401 residues, 0 other molecules, 2947 atoms

julia> Base.summarysize(c; exclude=Model) / 2947
98.67662029182219

That's less than half where we are now, #59 (comment)

It's probably best done as a series of changes, though, just in case we regret some of them.

JuliaCollections/OrderedCollections.jl#150 is preparatory work for these additional changes.

@jgreener64
Copy link
Member

Great!

v4.5.0 is at JuliaRegistries/General#130705 🎉

@jgreener64 jgreener64 mentioned this pull request May 12, 2025
5 tasks
@timholy
Copy link
Collaborator Author

timholy commented May 14, 2025

As the sequence of changes I have planned will ultimately generate merge conflicts for #71, let's wait and get that merged before getting started down this road.

This has a surprisingly large benefit for performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3-vector for coordinates?
2 participants