feat(code): limit the number of values in a sync response based on upper limit #1184

insumity · 2025-08-28T09:13:58Z

Closes: #1164

This PR is the extension of #1171 that I do not have access to update anymore. Look at all the commits after Re-request range from any peer, not necessarily the same one to see the changes.

Testing
To see that indeed max_response_size is working, we ran the response_size_limit_exceeded test against the main branch with the following params:

    test.build()
        .run_with_params(
            Duration::from_secs(60),
            TestParams {
                enable_value_sync: true,
                // Values are around ~900 bytes, so we canNOT have 2 values in one batch.
                rpc_max_size: ByteSize::b(1000),
                batch_size: 2,
                parallel_requests: 1,
                ..Default::default()
            },
        )
        .await

in which case the test fails because it tries to send 2 values at once something that cannot be done because rpc_max_size is set to 1000 bytes.

PR author checklist

For all contributors

Reference a GitHub issue
Ensure the PR title follows the conventional commits spec
Add a release note in RELEASE_NOTES.md if the change warrants it
Add an entry in BREAKING_CHANGES.md if the change warrants it

For external contributors

Maintainers of Malachite are allowed to push changes to the branch

code/crates/engine/src/sync.rs

code/crates/sync/src/handle.rs

hvanz

Amazing work, thanks @insumity!

As noted by @jmalicevic there will a performance hit for the extra encoding. Have you seen any performance decrease in your tests? Maybe a small benchmark test would be useful. As a solution, the encoded value could be cached somewhere but this will take some refactoring (for another PR). I'm not worried about the extra communication (GetResponseSize) between actors; this is just a couple of function calls.

code/crates/core-consensus/src/util/bounded_queue.rs

code/crates/engine/src/consensus.rs

hvanz · 2025-09-10T12:19:19Z

code/crates/sync/src/handle.rs

+{
+    // NOTE: We do not perform a `max_parallel_requests` check here in contrast to what is done, for
+    // example in `request_values`. This is because `request_values_range` is only called for retrieving
+    // partial responses, which means the original request is not on the wire anymore.


I understand the original request is not in the wire (that is, its response was received) but it's still in state.pending_requests. At least for the received range that is being processed. Is this correct? Then, after the new request issued here, there will be max_parallel_requests + 1 active requests in state.pending_requests. I'm not sure if this is a problem, but two requests will need to be removed from state.pending_requests before a new one can be issued.

This is a good point, although I think in practice the behavior stays the same.
For example you have state.pending_requests = {req1[1-10],req2[11-20]} . You split 1-10 because the range is too big and get : state.pending_requests = {req1[1-3],req2[11-20], req3[4-10]}
The problem is that when req2 is done, you cannot issue another request because you are not done processing req1 and req3 . If we compare this to how it was before batching , we are making things slightly worse by not allowing req4 until we process everything from the original req1.

However, in practice this delay is probably not very big compared to the initial use case. In any case, I would not block this PR on this optimization so that people can integrate with the API changes and we can then optimize this under the hood.

WDYT @hvanz @insumity ?

Thanks for this @jmalicevic. Yes, what you describe is my view as well.

Yes, I think we can live that. I'm thinking the extreme scenario where blocks are so big that they fill a whole response message. In the example, req1[1-10] would be split in 10 requests. I think this is not something that we should try to fix here. That would either be a problem in the configuration, or the whole setup is not well thought out. Maybe we could add a warning when state.pending_requests.len() > state.max_parallel_requests, to see in the logs when this case is happening very often.

Maybe we could add a warning when state.pending_requests.len() > state.max_parallel_requests, to see in the logs when this case is happening very often.

Added some logging here based on your comment.

insumity · 2025-09-11T13:49:06Z

Thanks @hvanz for your comments!

As noted by @jmalicevic there will a performance hit for the extra encoding. Have you seen any performance decrease in your tests? Maybe a small benchmark test would be useful. As a solution, the encoded value could be cached somewhere but this will take some refactoring (for another PR). I'm not worried about the extra communication (GetResponseSize) between actors; this is just a couple of function calls.

I did not see any performance decrease in the tests but that's because the block_size is pretty small (i.e., ~900 bytes per value). I have a PR on the informalsystems repo where I can tune block_size in tests and noticed the following rough numbers:

for a single value of 1 MB of random data, the ProtobufCodec encoding takes around ~50 µs;
for a value of 10 MB, encoding takes ~700 µs;
for a value of 25 MB, encoding takes ~1.5 ms;
for a value of 50 MB, encoding takes ~3 ms.

In any case, I changed the code so we can retrieve the encoded length without actually having to encode the data. So, now, computing the encoded length takes a few tens of microseconds irrespectively of the value size. Note that this only holds for the ProtobufCodec and not for the JsonCodec.
Update: I performed some slightly more serious benchmarking and the results can be found here. To encode and then get the length of a 100MB value we need ~5.5 ms but to get immediately the length of the encoded value using encoded_len we just need 85 ns. Also verified that encoded(sth).len() == encoded_len(sth).
Let me know if you want me to add the benchmark code in this PR.

Please do let me know what you think!

hvanz · 2025-09-11T20:46:09Z

Thanks for the benchmarks! They look pretty good. 10Mb is already quite big for a value and would probably reach the typical size limit of a response, so I think we don't need to worry for 700 µs.

insumity · 2025-09-12T08:43:12Z

Thanks for the benchmarks! They look pretty good. 10Mb is already quite big for a value and would probably reach the typical size limit of a response, so I think we don't need to worry for 700 µs.

@hvanz So, we're okay with encoding the data and then getting the length as was initial done? If that's the case, I can remove this commit.

hvanz · 2025-09-12T09:02:20Z

So, we're okay with encoding the data and then getting the length as was initial done? If that's the case, I can remove this commit.

@romac WDYT? Do we need this optimization? Encoding a value of 1Mb takes about 50 µs and for 10 MB it takes ~700 µs.

jmalicevic · 2025-09-15T13:26:00Z

@romac @hvanz , are we ok to merge this? Should we unblock in our fork if you don't have capacity?

jmalicevic · 2025-10-01T14:27:55Z

@romac @hvanz we have merged this in our fork and also added a follow up PR fixing a few other issues we found in the sync : informalsystems#15

The PR is long, happy to upstream it if you deem its useful.

romac · 2025-10-02T10:32:07Z

@romac @hvanz , are we ok to merge this? Should we unblock in our fork if you don't have capacity?

Apologies, I completely missed your question…

@romac @hvanz we have merged this in our fork and also added a follow up PR fixing a few other issues we found in the sync: informalsystems#15

Glad to hear you took the initiative of merging it in your fork and my apologies again for not getting to this sooner.

Unfortunately, we will likely not be able to merge this before a week or so.

The PR is long, happy to upstream it if you deem its useful.

That would be awesome if it's not too much to ask, otherwise I am happy to upstream it myself from your fork. We can also wait until this PR is merged though, to avoid stacking them.

jmalicevic · 2025-10-02T10:36:44Z

No worries, I understand :) Yes lets merge this PR and then we will upstream the other one and happy to do a sync review if you find it useful as its quite big.

insumity force-pushed the insumity/add_response_size_check branch from d8e6fe8 to e5be583 Compare August 28, 2025 09:25

insumity marked this pull request as ready for review August 28, 2025 09:54

insumity requested review from ancazamfir and romac as code owners August 28, 2025 09:54

romac mentioned this pull request Aug 28, 2025

feat(code): limit the number of values in a sync response based on upper limit #1171

Closed

5 tasks

insumity force-pushed the insumity/add_response_size_check branch from 7313c82 to 68a172e Compare August 28, 2025 14:12

jmalicevic reviewed Sep 2, 2025

View reviewed changes

code/crates/engine/src/sync.rs Show resolved Hide resolved

jmalicevic reviewed Sep 2, 2025

View reviewed changes

code/crates/engine/src/sync.rs Show resolved Hide resolved

jmalicevic reviewed Sep 2, 2025

View reviewed changes

code/crates/sync/src/handle.rs Show resolved Hide resolved

jmalicevic reviewed Sep 2, 2025

View reviewed changes

code/crates/sync/src/handle.rs Show resolved Hide resolved

jmalicevic mentioned this pull request Sep 3, 2025

feat: Value sync follow up work informalsystems/malachite#1

Closed

3 tasks

romac requested a review from hvanz September 5, 2025 10:14

insumity force-pushed the insumity/add_response_size_check branch 2 times, most recently from 60e5046 to 23d6957 Compare September 8, 2025 11:38

insumity and others added 14 commits September 8, 2025 17:00

prevent sending a total of values that exceed the max response size

654cf15

fix formatting issues

f4be612

Cleanup

1359cf4

Small refactor using new update_request state method

5bd7c26

Add max_parallel_requests state method

3db4ca5

Re-request range from any peer, not necessarily the same one

bc304c5

introduce warnings when pushing to a bounded queue

3dadbb1

get the value size from the network codec

37a1d4a

remove parallel-requests check and add test

ef5d2af

fix formatting issues

3bf42dd

add a defensive check if peer ids do not match

4b9bddc

fix issue with potentially multiple parallel requests

912a27d

take into account breaking-early comment

6f06ea4

add rpc_max_size config in tests

49fddfd

hvanz approved these changes Sep 10, 2025

View reviewed changes

do not encode the data to compute the encoded length

cb40f73

insumity force-pushed the insumity/add_response_size_check branch from 23d6957 to cb40f73 Compare September 11, 2025 13:21

clean up process_sync_response based on Hernan's comment

73734b5

remove unused method

d0f460d

log in case of re-requesting partial ranges

225536e

feat(code): limit the number of values in a sync response based on upper limit #1184

Are you sure you want to change the base?

feat(code): limit the number of values in a sync response based on upper limit #1184

Conversation

insumity commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR author checklist

For all contributors

For external contributors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hvanz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hvanz Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

jmalicevic Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

insumity Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

hvanz Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

insumity Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

insumity commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hvanz commented Sep 11, 2025

Uh oh!

insumity commented Sep 12, 2025

Uh oh!

hvanz commented Sep 12, 2025

Uh oh!

jmalicevic commented Sep 15, 2025

Uh oh!

jmalicevic commented Oct 1, 2025

Uh oh!

romac commented Oct 2, 2025

Uh oh!

jmalicevic commented Oct 2, 2025

Uh oh!

Uh oh!

insumity commented Aug 28, 2025 •

edited

Loading

insumity commented Sep 11, 2025 •

edited

Loading