Skip to content

Conversation

EVAST9919
Copy link
Contributor

@EVAST9919 EVAST9919 commented Aug 30, 2025

Split from #6613 which serves as base implementation of BBH inside a Path. By it's own this pr only improves Path.ReceivePositionalInputAt performance.

Structure

When passing path vertices to the PathBBH we will be building a binary tree (from bottom to top), in which most-bottom row will contain path segments and their bounding boxes. Then we will update all the parent nodes with combined bounds of left and right children.

Of course not all paths have 2^n segments, so we need to create such a tree which will be able to scale to any amount of segments without having empty nodes to reduce memory needed to store them. All the nodes are stored in an array of size roughly ~2n segment count in the following order: [nodes from the 0 depth from left to right][nodes from the 1 depth from left to right]...[nodes from the n-1 depth from left to right][leafs(segments)]. Array will be populated from end to start starting with leafs and then their parents and their parents and so on. And in the end the whole tree is built in just 1 cycle.

Path.SetVertices benchmark

master:

Method Mean Error StdDev Allocated
Compute100Segments 592.5 ns 1.81 ns 1.69 ns -
Compute1KSegments 5,887.3 ns 110.78 ns 98.20 ns -
Compute10KSegments 59,171.8 ns 127.50 ns 113.03 ns -
Compute100KSegments 569,454.3 ns 2,341.02 ns 2,075.25 ns -
Compute1MSegments 5,895,614.8 ns 22,614.86 ns 20,047.48 ns -

pr:

Method Mean Error StdDev Allocated
Compute100Segments 2.234 us 0.0027 us 0.0021 us 56 B
Compute1KSegments 22.013 us 0.0453 us 0.0424 us 56 B
Compute10KSegments 365.738 us 2.3118 us 2.0493 us 56 B
Compute100KSegments 4,013.083 us 7.9266 us 7.4145 us -
Compute1MSegments 41,526.195 us 65.9597 us 61.6988 us -

pr (after 06d895d):

Method Mean Error StdDev Gen0 Allocated
Compute100Segments 1.508 us 0.0053 us 0.0049 us 0.0019 56 B
Compute1KSegments 15.007 us 0.0308 us 0.0288 us - 56 B
Compute10KSegments 152.082 us 0.5901 us 0.5231 us - 56 B
Compute100KSegments 1,535.200 us 2.7969 us 2.6163 us - 57 B
Compute1MSegments 21,892.779 us 80.7825 us 63.0697 us - -

While pr values are about ~2.5x slower (with more micro-optimisations can be improved further, target would be 2x given amount of nodes processed is ~2x segment count), it's worth noting that in case of master (with snaking sliders) we will see these timings each frame and with #6613 - only once, and then timings from Path.SetStartProgress table (which are basically nothing)

Path.ReceivePositionalInputAt benchmark

master:

Method Mean Error StdDev Allocated
Contains100 302.0 ns 0.63 ns 0.56 ns -
Contains1K 2,906.9 ns 12.67 ns 11.85 ns -
Contains10K 76,607.3 ns 267.95 ns 250.64 ns -
Contains100K 868,608.0 ns 2,824.05 ns 2,641.61 ns -
Contains1M 8,683,885.7 ns 106,430.14 ns 99,554.82 ns -

pr:

Method Mean Error StdDev Allocated
Contains100 195.1 ns 0.20 ns 0.19 ns -
Contains1K 289.8 ns 0.21 ns 0.20 ns -
Contains10K 444.2 ns 0.41 ns 0.37 ns -
Contains100K 594.2 ns 1.17 ns 1.04 ns -
Contains1M 945.1 ns 2.87 ns 2.54 ns -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant