Skip to content

Conversation

@sieniven
Copy link

📝 Summary

  • Context of the full issue: Cache miss on engine_newPayload to builder causes slow payload validation #333
  • When flashblocks stateroots calculation flag is switched to false, because state roots are never calculated in payloads, on payload resolving, engine api request to insert locally executed blocks will insert blocks without state roots
  • This eventually leads to cache misses when engine_newPayload is called on the op-rbuilder from rollup-boost, which leads to the full validation of this payload (even though they were locally built by the builder), including the re-execution of the payload transactions and subsequent state root calculation
  • This PR adds the additional stateroot calculation triggered on payload resolve, ensuring that we do not take up sequencing time re-executing the payload
  • Note that state root calculation regardless will block the sequencing time - however, rollup boost will obtain the cached incrementally built builder payload anyways and trigger the payload resolve asynchronously. This should be optimal when integrated with flashblocks

✅ I have completed the following steps:

  • [✅ ] Run make lint
  • [✅ ] Run make test
  • Added tests (if applicable)

@sieniven sieniven force-pushed the niven/fix-state-root branch from 5d86917 to ba5ada0 Compare November 27, 2025 05:00
ctx = ctx.with_cancel(fb_cancel).with_extra_ctx(next_flashblocks_ctx);
},
_ = block_cancel.cancelled() => {
self.resolve_best_payload(&mut state, &ctx, &mut info, &best_payload).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed that rollup-boost will cancel building (send engine_getPayload)
I think correct way would be to calculate state root asynchronously after we sent last flashblock

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why it is placed here is because on the event where rollup-boost does not send the payload resolve, on a subsequent FCU, the payload will be resolved regardless.

Also, I placed it on resolve because we are keen in exploring using the flashblock payload builder directly on a custom reth node without the need to use rollup-boost. Maybe this implementation is better in terms of de-coupling with rollup boost service?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is hard to deterministically say for sure which is the last flashblock. As observed during our deployment, it seems like on the builder, on most cases, the payload building early cancels which results in the payload not ever reaching the target flashblock. This seems to me to be the most deterministic way of ensuring locally built payloads contain a state root.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you need to use --flashblock.leeway-time
And for external state root it should be considerable as usually it used in cases when it's slow to calculate state root, so you need some extra time to do this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for decoupling
I think some assumptions behind state root calculation weren't designed for this case, so it's better to give it a good thought on how to change it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/flashbots/op-rbuilder/pull/334/files#diff-20125ce86c70cfd520b09bc61e2032873f80249afe0027ee76dfbfcb503e7fb7R535-R542 - This issue should be resolved in the attached code block. We start the state root calculation immediately once the last target flashblock is built.

@SozinM
Copy link
Collaborator

SozinM commented Nov 27, 2025

Hi!
Have you tested this?
Because it seems to me it shouldn't work, as you still don't save executed block

@SozinM
Copy link
Collaborator

SozinM commented Nov 27, 2025

@sieniven could you please attach newPayload latency before and after fix?
I think this would be a good starting point of looking into the deeper

@sieniven sieniven marked this pull request as draft November 27, 2025 10:45
@sieniven
Copy link
Author

@SozinM Hey, thanks and appreciate your time in helping review this :) Converting this PR to draft first because there are still some issues when testing - will re-open it in abit once everything is working on our end.

@sieniven sieniven marked this pull request as ready for review November 27, 2025 13:47
@sieniven
Copy link
Author

@SozinM I have re-opened the PR. From the above discussion thread, I understand there is the leeway time configuration which allows us to early package flashblocks. However on the assumption that we are trying to maximize throughput of the chain, setting too big a leeway time may tradeoff chain max throughput (gas executed per block).

I am trying to think of the most optimal way of doing this - it still seems to me that unless the intent of the builder for flashbots is to always couple it with the rollup boost proxy service, perhaps we could explore the default behavior of the builder to calculate state root on payload resolve.

Giving some context here - we found the issue with the builder sync-ing with tip to have severe effects on the chain's maximum throughput (gas per second) during our stress testing on the flashblocks architecture. It was found that during our stress tests which benchmarks sending ERC20 transactions, the tps of the chain drops significantly and a huge part of the bottleneck was coming from cache misses due to the mismatch in block hash. This PR has been tested on our end to work and resolves the cache miss issue, and the current design is such that the builder will be default calculate state root on payload resolution, which will also nicely help our use case as we intend to use op-rbuilder as a dependency library crate to integrate the flashblock payload builder directly into our node.

Thanks again and appreciate the prompt review for this PR :)

@sieniven
Copy link
Author

@SozinM another consideration to why this PR shouldnt have no impact with rollup boost is because engine_getPayload on rollup boost uses the underlying accumulated flashblocks' cache.

Since the payload resolution on the builder is called async, together with the expensive state root calculation of the unsafe payload with the default EL on FCU, this fix on cache misses to locally built blocks should have little impact on the intended design of rollup boost?

@SozinM
Copy link
Collaborator

SozinM commented Nov 27, 2025

@sieniven Can you please show newPayload graph with and without this fix

@sieniven
Copy link
Author

sieniven commented Dec 1, 2025

@sieniven Can you please show newPayload graph with and without this fix

@SozinM We are onto this. Will provide you with the API response time from the CL soon.

@SozinM
Copy link
Collaborator

SozinM commented Dec 1, 2025

@sieniven reth already provides these metrics, you could set up their op-reth dashboard

@sieniven
Copy link
Author

sieniven commented Dec 1, 2025

@sieniven reth already provides these metrics, you could set up their op-reth dashboard

We used a different approach where we used the metrics on op-node instead. Since that will be a better metric in terms of understanding the impact of this PR? Comparing engine_newPayload without state root calculations on the builder will definitely be significantly slower since we now add state root calculations at the end of the payload resolution.

@SozinM
Copy link
Collaborator

SozinM commented Dec 1, 2025

@sieniven engine_newPayload metric would suffice.
If you solution (in regard to caching) works, than we would see that new_payload latency without fix was high (prob 20+ms), and after fix became low (1-2 ms)

@sieniven
Copy link
Author

sieniven commented Dec 1, 2025

@SozinM Sure, we will generate both on the EL response times and on the CL response times for you, during our stress test environment (which is an ERC20 stress test).

On a separate note, I think there are a few more issues which I have opened on rollup-boost which explains why cache misses will still happen in your current flashblock design:

  1. Rollup-boost occasionally fails to accept new flashblock payload due to failure to handle ordering rollup-boost#449
  2. Rollup-boost occasionally fails to accept flashblock payloads rollup-boost#448

Because the source of truth of the flashblock architecture is on the rollup-boost accumulated flashblock sequence, the consensus layer will always accept this as the unsafe payload (and subsequently calculate the external state root of that payload there). However, we cannot guarantee the state of the accumulated sequence of rollup boost and the builder to be consistent, thus resulting in cache misses on locally built payloads on engine_newPayload to the builder. We have tested and this PR is confirmed to have fixed cache misses on engine_newPayload if the payload state matches (on rollup-boost and on op-rbuilder during payload resolution). However, if the states are mismatched, payloads are still re-validated on the builder which affects the overall chain throughput.

Also, it was noticed on my end that when stress testing, payload re-validations could take more than 2 seconds on the builder. There might be a mutex issue on the payload, but I have yet to really debug this fully. I cant replicate that anymore with the fixes on the state root calculations, but might be worth adding to your todo to investigate that.

@SozinM
Copy link
Collaborator

SozinM commented Dec 1, 2025

Wdym by

locally built payloads

Is it payload built be fallback EL, or empty payload built by CL?

@sieniven
Copy link
Author

sieniven commented Dec 2, 2025

Wdym by

locally built payloads

Is it payload built be fallback EL, or empty payload built by CL?

@SozinM Sorry for the confusion. What i meant is in this specific case, payload is locally built by the op-rbuilder. However, due to the accumulated payload sequence on rollup-boost state being different from the builder, the eventual accepted unsafe payload returned to the op-node sequencer is different from what was actually built on the op-rbuilder (when payload is resolved on rollup-boost). This results in the subsequent engine_newPayload to still cache miss - which is what I meant by this PR not being able to fully resolve the cache miss issue.

@sieniven
Copy link
Author

sieniven commented Dec 4, 2025

@SozinM We have generated the metrics you were asking for. Note that these were generated under stress test situation with full blocks filled with ERC20 transactions from multiple addresses to simulate high chain traffic situation. The metrics captures a 15minute duration, which sufficiently captures the results. Hope this helps.

  • Before state root PR fixes:
image
  • After state root PR fixes:
image Note that there are still occasional cache misses which was the default flashblocks design issue that I was mentioning above in this comment: https://github.com//pull/334#issuecomment-3595341837. Thus this PR still doesnt fully resolve the issue of cache misses (resulting in full validation of locally built payloads), but it significantly reduces this and should help with optimizing throughput of the default flashblocks architecture.

@sieniven sieniven changed the title Fix Add state root calculation on payload resolve fix: Add state root calculation on payload resolve Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants