-
Notifications
You must be signed in to change notification settings - Fork 73
feat(code): limit the number of values in a sync response based on upper limit #1184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(code): limit the number of values in a sync response based on upper limit #1184
Conversation
d8e6fe8
to
e5be583
Compare
7313c82
to
68a172e
Compare
60e5046
to
23d6957
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, thanks @insumity!
As noted by @jmalicevic there will a performance hit for the extra encoding. Have you seen any performance decrease in your tests? Maybe a small benchmark test would be useful. As a solution, the encoded value could be cached somewhere but this will take some refactoring (for another PR). I'm not worried about the extra communication (GetResponseSize
) between actors; this is just a couple of function calls.
code/crates/sync/src/handle.rs
Outdated
{ | ||
// NOTE: We do not perform a `max_parallel_requests` check here in contrast to what is done, for | ||
// example in `request_values`. This is because `request_values_range` is only called for retrieving | ||
// partial responses, which means the original request is not on the wire anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the original request is not in the wire (that is, its response was received) but it's still in state.pending_requests
. At least for the received range that is being processed. Is this correct? Then, after the new request issued here, there will be max_parallel_requests + 1
active requests in state.pending_requests
. I'm not sure if this is a problem, but two requests will need to be removed from state.pending_requests
before a new one can be issued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, although I think in practice the behavior stays the same.
For example you have state.pending_requests = {req1[1-10],req2[11-20]}
. You split 1-10 because the range is too big and get : state.pending_requests = {req1[1-3],req2[11-20], req3[4-10]}
The problem is that when req2
is done, you cannot issue another request because you are not done processing req1
and req3
. If we compare this to how it was before batching , we are making things slightly worse by not allowing req4
until we process everything from the original req1
.
However, in practice this delay is probably not very big compared to the initial use case. In any case, I would not block this PR on this optimization so that people can integrate with the API changes and we can then optimize this under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @jmalicevic. Yes, what you describe is my view as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can live that. I'm thinking the extreme scenario where blocks are so big that they fill a whole response message. In the example, req1[1-10]
would be split in 10 requests. I think this is not something that we should try to fix here. That would either be a problem in the configuration, or the whole setup is not well thought out. Maybe we could add a warning when state.pending_requests.len() > state.max_parallel_requests
, to see in the logs when this case is happening very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a warning when state.pending_requests.len() > state.max_parallel_requests, to see in the logs when this case is happening very often.
Added some logging here based on your comment.
23d6957
to
cb40f73
Compare
Thanks @hvanz for your comments!
I did not see any performance decrease in the tests but that's because the
In any case, I changed the code so we can retrieve the encoded length without actually having to encode the data. So, now, computing the encoded length takes a few tens of microseconds irrespectively of the value size. Note that this only holds for the Please do let me know what you think! |
Thanks for the benchmarks! They look pretty good. 10Mb is already quite big for a value and would probably reach the typical size limit of a response, so I think we don't need to worry for 700 µs. |
@hvanz So, we're okay with encoding the data and then getting the length as was initial done? If that's the case, I can remove this commit. |
@romac @hvanz we have merged this in our fork and also added a follow up PR fixing a few other issues we found in the sync : informalsystems#15 The PR is long, happy to upstream it if you deem its useful. |
Apologies, I completely missed your question…
Glad to hear you took the initiative of merging it in your fork and my apologies again for not getting to this sooner. Unfortunately, we will likely not be able to merge this before a week or so.
That would be awesome if it's not too much to ask, otherwise I am happy to upstream it myself from your fork. We can also wait until this PR is merged though, to avoid stacking them. |
No worries, I understand :) Yes lets merge this PR and then we will upstream the other one and happy to do a sync review if you find it useful as its quite big. |
Closes: #1164
This PR is the extension of #1171 that I do not have access to update anymore. Look at all the commits after Re-request range from any peer, not necessarily the same one to see the changes.
Testing
To see that indeed
max_response_size
is working, we ran theresponse_size_limit_exceeded
test against themain
branch with the following params:in which case the test fails because it tries to send 2 values at once something that cannot be done because
rpc_max_size
is set to1000
bytes.PR author checklist
For all contributors
RELEASE_NOTES.md
if the change warrants itBREAKING_CHANGES.md
if the change warrants itFor external contributors