fix: Add state root calculation on payload resolve #334

sieniven · 2025-11-27T04:49:28Z

📝 Summary

Context of the full issue: Cache miss on engine_newPayload to builder causes slow payload validation #333
When flashblocks stateroots calculation flag is switched to false, because state roots are never calculated in payloads, on payload resolving, engine api request to insert locally executed blocks will insert blocks without state roots
This eventually leads to cache misses when engine_newPayload is called on the op-rbuilder from rollup-boost, which leads to the full validation of this payload (even though they were locally built by the builder), including the re-execution of the payload transactions and subsequent state root calculation
This PR adds the additional stateroot calculation triggered on payload resolve, ensuring that we do not take up sequencing time re-executing the payload
Note that state root calculation regardless will block the sequencing time - however, rollup boost will obtain the cached incrementally built builder payload anyways and trigger the payload resolve asynchronously. This should be optimal when integrated with flashblocks

✅ I have completed the following steps:

[✅ ] Run make lint
[✅ ] Run make test
Added tests (if applicable)

crates/op-rbuilder/src/builders/flashblocks/payload.rs

SozinM · 2025-11-27T05:39:23Z

crates/op-rbuilder/src/builders/flashblocks/payload.rs

                    ctx = ctx.with_cancel(fb_cancel).with_extra_ctx(next_flashblocks_ctx);
                },
                _ = block_cancel.cancelled() => {
+                    self.resolve_best_payload(&mut state, &ctx, &mut info, &best_payload).await;


It's not guaranteed that rollup-boost will cancel building (send engine_getPayload)
I think correct way would be to calculate state root asynchronously after we sent last flashblock

The reason why it is placed here is because on the event where rollup-boost does not send the payload resolve, on a subsequent FCU, the payload will be resolved regardless.

Also, I placed it on resolve because we are keen in exploring using the flashblock payload builder directly on a custom reth node without the need to use rollup-boost. Maybe this implementation is better in terms of de-coupling with rollup boost service?

Also, it is hard to deterministically say for sure which is the last flashblock. As observed during our deployment, it seems like on the builder, on most cases, the payload building early cancels which results in the payload not ever reaching the target flashblock. This seems to me to be the most deterministic way of ensuring locally built payloads contain a state root.

yeah, you need to use --flashblock.leeway-time
And for external state root it should be considerable as usually it used in cases when it's slow to calculate state root, so you need some extra time to do this

As for decoupling
I think some assumptions behind state root calculation weren't designed for this case, so it's better to give it a good thought on how to change it

https://github.com/flashbots/op-rbuilder/pull/334/files#diff-20125ce86c70cfd520b09bc61e2032873f80249afe0027ee76dfbfcb503e7fb7R535-R542 - This issue should be resolved in the attached code block. We start the state root calculation immediately once the last target flashblock is built.

crates/op-rbuilder/src/builders/flashblocks/payload.rs

SozinM · 2025-11-27T05:45:43Z

Hi!
Have you tested this?
Because it seems to me it shouldn't work, as you still don't save executed block

SozinM · 2025-11-27T08:35:18Z

@sieniven could you please attach newPayload latency before and after fix?
I think this would be a good starting point of looking into the deeper

sieniven · 2025-11-27T10:46:40Z

@SozinM Hey, thanks and appreciate your time in helping review this :) Converting this PR to draft first because there are still some issues when testing - will re-open it in abit once everything is working on our end.

sieniven · 2025-11-27T13:53:19Z

@SozinM I have re-opened the PR. From the above discussion thread, I understand there is the leeway time configuration which allows us to early package flashblocks. However on the assumption that we are trying to maximize throughput of the chain, setting too big a leeway time may tradeoff chain max throughput (gas executed per block).

I am trying to think of the most optimal way of doing this - it still seems to me that unless the intent of the builder for flashbots is to always couple it with the rollup boost proxy service, perhaps we could explore the default behavior of the builder to calculate state root on payload resolve.

Giving some context here - we found the issue with the builder sync-ing with tip to have severe effects on the chain's maximum throughput (gas per second) during our stress testing on the flashblocks architecture. It was found that during our stress tests which benchmarks sending ERC20 transactions, the tps of the chain drops significantly and a huge part of the bottleneck was coming from cache misses due to the mismatch in block hash. This PR has been tested on our end to work and resolves the cache miss issue, and the current design is such that the builder will be default calculate state root on payload resolution, which will also nicely help our use case as we intend to use op-rbuilder as a dependency library crate to integrate the flashblock payload builder directly into our node.

Thanks again and appreciate the prompt review for this PR :)

sieniven · 2025-11-27T14:32:14Z

@SozinM another consideration to why this PR shouldnt have no impact with rollup boost is because engine_getPayload on rollup boost uses the underlying accumulated flashblocks' cache.

Since the payload resolution on the builder is called async, together with the expensive state root calculation of the unsafe payload with the default EL on FCU, this fix on cache misses to locally built blocks should have little impact on the intended design of rollup boost?

SozinM · 2025-11-27T14:40:27Z

@sieniven Can you please show newPayload graph with and without this fix

sieniven · 2025-12-01T04:09:52Z

@sieniven Can you please show newPayload graph with and without this fix

@SozinM We are onto this. Will provide you with the API response time from the CL soon.

SozinM · 2025-12-01T08:27:22Z

@sieniven reth already provides these metrics, you could set up their op-reth dashboard

sieniven · 2025-12-01T08:41:22Z

@sieniven reth already provides these metrics, you could set up their op-reth dashboard

We used a different approach where we used the metrics on op-node instead. Since that will be a better metric in terms of understanding the impact of this PR? Comparing engine_newPayload without state root calculations on the builder will definitely be significantly slower since we now add state root calculations at the end of the payload resolution.

SozinM · 2025-12-01T08:47:13Z

@sieniven engine_newPayload metric would suffice.
If you solution (in regard to caching) works, than we would see that new_payload latency without fix was high (prob 20+ms), and after fix became low (1-2 ms)

sieniven · 2025-12-01T08:54:17Z

@SozinM Sure, we will generate both on the EL response times and on the CL response times for you, during our stress test environment (which is an ERC20 stress test).

On a separate note, I think there are a few more issues which I have opened on rollup-boost which explains why cache misses will still happen in your current flashblock design:

Rollup-boost occasionally fails to accept new flashblock payload due to failure to handle ordering rollup-boost#449
Rollup-boost occasionally fails to accept flashblock payloads rollup-boost#448

Because the source of truth of the flashblock architecture is on the rollup-boost accumulated flashblock sequence, the consensus layer will always accept this as the unsafe payload (and subsequently calculate the external state root of that payload there). However, we cannot guarantee the state of the accumulated sequence of rollup boost and the builder to be consistent, thus resulting in cache misses on locally built payloads on engine_newPayload to the builder. We have tested and this PR is confirmed to have fixed cache misses on engine_newPayload if the payload state matches (on rollup-boost and on op-rbuilder during payload resolution). However, if the states are mismatched, payloads are still re-validated on the builder which affects the overall chain throughput.

Also, it was noticed on my end that when stress testing, payload re-validations could take more than 2 seconds on the builder. There might be a mutex issue on the payload, but I have yet to really debug this fully. I cant replicate that anymore with the fixes on the state root calculations, but might be worth adding to your todo to investigate that.

SozinM · 2025-12-01T09:24:26Z

Wdym by

locally built payloads

Is it payload built be fallback EL, or empty payload built by CL?

sieniven · 2025-12-02T06:38:18Z

Wdym by

locally built payloads

Is it payload built be fallback EL, or empty payload built by CL?

@SozinM Sorry for the confusion. What i meant is in this specific case, payload is locally built by the op-rbuilder. However, due to the accumulated payload sequence on rollup-boost state being different from the builder, the eventual accepted unsafe payload returned to the op-node sequencer is different from what was actually built on the op-rbuilder (when payload is resolved on rollup-boost). This results in the subsequent engine_newPayload to still cache miss - which is what I meant by this PR not being able to fully resolve the cache miss issue.

sieniven · 2025-12-04T03:46:54Z

@SozinM We have generated the metrics you were asking for. Note that these were generated under stress test situation with full blocks filled with ERC20 transactions from multiple addresses to simulate high chain traffic situation. The metrics captures a 15minute duration, which sufficiently captures the results. Hope this helps.

Before state root PR fixes:

After state root PR fixes:

Note that there are still occasional cache misses which was the default flashblocks design issue that I was mentioning above in this comment: https://github.com//pull/334#issuecomment-3595341837. Thus this PR still doesnt fully resolve the issue of cache misses (resulting in full validation of locally built payloads), but it significantly reduces this and should help with optimizing throughput of the default flashblocks architecture.

sieniven requested review from 0x416e746f6e, SozinM, avalonche and karim-agha as code owners November 27, 2025 04:49

Add state root calculation on payload resolve

ba5ada0

sieniven force-pushed the niven/fix-state-root branch from 5d86917 to ba5ada0 Compare November 27, 2025 05:00

SozinM requested changes Nov 27, 2025

View reviewed changes

sieniven added 5 commits November 27, 2025 13:51

Fix

f7a3933

fix: update executed block to upstream version

b670c23

Revert to previous logic on building fallback block

0ea5e3f

Remove merging transitions in statedb

c3e65b2

Fix triggering resolving payload

4e58bdb

sieniven marked this pull request as draft November 27, 2025 10:45

sieniven mentioned this pull request Nov 27, 2025

Add state root calculation on payload resolve okx/op-rbuilder#22

Merged

Fixes

18111e9

sieniven marked this pull request as ready for review November 27, 2025 13:47

sieniven added 5 commits November 27, 2025 23:13

Fix

51b161f

Fix

042b2f9

Fix state mismatch

edbc113

Fix

5f82b81

Fix

2d53279

sieniven changed the title ~~Fix Add state root calculation on payload resolve~~ fix: Add state root calculation on payload resolve Dec 4, 2025

fix: Add state root calculation on payload resolve #334

Are you sure you want to change the base?

fix: Add state root calculation on payload resolve #334

Conversation

sieniven commented Nov 27, 2025

📝 Summary

✅ I have completed the following steps:

Uh oh!

Uh oh!

SozinM Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

sieniven Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

sieniven Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

SozinM Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

SozinM Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

sieniven Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SozinM commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SozinM commented Nov 27, 2025

Uh oh!

sieniven commented Nov 27, 2025

Uh oh!

sieniven commented Nov 27, 2025

Uh oh!

sieniven commented Nov 27, 2025

Uh oh!

SozinM commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sieniven commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SozinM commented Dec 1, 2025

Uh oh!

sieniven commented Dec 1, 2025

Uh oh!

SozinM commented Dec 1, 2025

Uh oh!

sieniven commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SozinM commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sieniven commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sieniven commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SozinM commented Nov 27, 2025 •

edited

Loading

SozinM commented Nov 27, 2025 •

edited

Loading

sieniven commented Dec 1, 2025 •

edited

Loading

sieniven commented Dec 1, 2025 •

edited

Loading

SozinM commented Dec 1, 2025 •

edited

Loading

sieniven commented Dec 2, 2025 •

edited

Loading