Replace Dart-flute-complex with Dart-flute-todomvc #97

eqrion · 2025-07-23T20:59:04Z

This is the WIP commit I've been using to compare the behavior of Dart-flute-todomvc and Dart-flute-complex. I also built a local version of https://wonderous.app/web/ using Wasm-GC to compare as well.

From profiling I observed the following relative rates of allocation in bytes per frame:

Dart-flute-todomvc = 1x (baseline)
Dart-flute-complex = 39x
wondrous.app = 0.34x

This is from a profiler that samples 1% of allocations and reports the stack, bytes, etc for them. So I have good relative numbers, but not precise absolute numbers.

In summary, complex allocates about 39x more bytes than todomvc, which allocates about 3x more bytes than wondrous.app.

In addition, over half of the allocated bytes in complex are coming from stacks with CupertinoTimePicker in their name. That seems excessive to me.

I would prefer if todomvc replaces complex, as it seems more representative of realistic workloads. This PR includes both for if anyone else wants to do comparisons on them, but I would remove the complex benchmark if we go ahead with this.

Here are dartpad links for complex/todomvc if anyone wants to look at the rendered output of either.

cc' @danleh @camillobruni @kmiller68 @mkustermann

Update to the latest wasm_gc_benchmarks commit, and pull in the todomvc benchmark too. I had to update the benchmark runner polyfill. I just copy pasted things from tools/run_wasm.js until it worked for me. This part likely needs some filtering and clean up.

netlify · 2025-07-23T20:59:10Z

✅ Deploy Preview for webkit-jetstream-preview ready!

Name	Link
🔨 Latest commit	`e16893c`
🔍 Latest deploy log	https://app.netlify.com/projects/webkit-jetstream-preview/deploys/68814d1b47b2f40007fa30c2
😎 Deploy Preview	https://deploy-preview-97--webkit-jetstream-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

danleh

Thanks a lot, changing to todomvc as the Dart workload generally sounds fine to me (but it would be great if @mkustermann can chime in as the Dart expert).

In terms of overall CPU profile, we see on d8 (complex vs. todomvc, where I already increased the iterations for todomvc to 30, see comment in the driver):

Roughly the same amount of samples spent in compiling code (2.2B vs. 2.1B), so given the shorter overall runtime, compilation accounts for a higher proportion of the samples in todomvc (5% vs. 11%).
todomvc is certainly much less GC heavy, both in absolute (8.8B vs. 2.1B samples) and relative (20% vs. 11% of all samples) terms.
In terms of the bottom up profile, they seem to be roughly similar, i.e., both contain similar high-self time functions, e.g., _getMasqueradedRuntimeType, _HashSet.add, MultiChildRenderObjectElement.update, ComponentElement.performRebuild, _CompressedNode.get, SingleChildRenderObjectElement.update, Element.dependOnInheritedElement, _checkInstance (polymorphic dispatcher). The heaviest functions that were in complex, but are missing from the todomvc profile are BoxConstraints.==, RenderBox.layout, RenderShiftedBox.paint @mkustermann can you comment whether that's realistic/fine/intended?

Also some smaller nits and fixes to make it run in d8, see detailed comments.

danleh · 2025-07-30T12:37:03Z

Dart/benchmark.js

+    try {
+      await action();
+    } catch (e) {
+      // JSC doesn't report/print uncaught async exceptions for some reason.


@kmiller68 Is this something to fix on JSC side?

danleh · 2025-07-30T12:52:14Z

Dart/benchmark.js

+  var id = timerIdCounter++;
+  // A callback can be scheduled at most once.
+  // (console.assert is only available on D8)
+  if (isD8) console.assert(f.$timerId === undefined);


While console.assert is available in d8, we overwrite/polyfill the console object in

JetStream/shell-config.js

Line 27 in 36e8a98

console = {

, so uncommenting this line causes an error. Let's just remove it.

danleh · 2025-07-30T12:53:22Z

Dart/benchmark.js

+self.dartUseDateNowForTicks = true;
+
+function dartPrint(...args) { console.log(args); }
+// globalThis.window ??= globalThis;


This line needs to be uncommented for d8.

danleh · 2025-07-30T12:54:59Z

Dart/benchmark.js

+
+// Signals `Stopwatch._initTicker` to use `Date.now` to get ticks instead of
+// `performance.now`, as it's not available in d8.
+self.dartUseDateNowForTicks = true;


I believe this is not required, d8 has performance.now since a number of years: https://crsrc.org/c/v8/src/d8/d8.cc;drc=cb2b1a2ff084490f114e27a84b2a9a439af40ded;l=2180

danleh · 2025-07-30T12:55:21Z

Dart/build.sh

@@ -19,7 +19,10 @@ mkdir -p build/ | tee -a "$BUILD_LOG"
 # Generic Dart2wasm runner.
 cp wasm_gc_benchmarks/tools/run_wasm.js build/ | tee -a "$BUILD_LOG"
 # "Flute Complex" benchmark application.
-cp wasm_gc_benchmarks/benchmarks-out/flute.dart2wasm.{mjs,wasm} build/ | tee -a "$BUILD_LOG"
+# cp wasm_gc_benchmarks/benchmarks-out/flute.complex.dart2wasm.mjs build/flute.dart2wasm.mjs | tee -a "$BUILD_LOG"


nit: remove commented out lines

danleh · 2025-07-30T12:56:42Z

JetStreamDriver.js

@@ -1272,7 +1272,7 @@ class WSLBenchmark extends Benchmark {
                benchmark.buildStdlib();
                results.push(performance.now() - start);

-                performance.measure(markLabel, markLabel);
+                // performance.measure(markLabel, markLabel);


danleh · 2025-07-30T12:56:46Z

JetStreamDriver.js

@@ -1283,7 +1283,7 @@ class WSLBenchmark extends Benchmark {
                benchmark.run();
                results.push(performance.now() - start);

-                performance.measure(markLabel, markLabel);
+                // performance.measure(markLabel, markLabel);


danleh · 2025-07-30T12:57:54Z

Dart/benchmark.js

+// `performance.now`, as it's not available in d8.
+self.dartUseDateNowForTicks = true;
+
+function dartPrint(...args) { console.log(args); }


Using print instead of console.log here hides the Dart console output, which I think makes sense.

danleh · 2025-07-30T13:00:17Z

JetStreamDriver.js

+        ],
+        preload: {
+            jsModule: "./Dart/build/flute.todomvc.dart2wasm.mjs",
+            wasmBinary: "./Dart/build/flute.todomvc.dart2wasm.wasm",
        },
        iterations: 15,


Given the shorter runtime of todomvc, I'd be happy to increase iterations to 30 (or so), to have a bit less noisy measurement (still <5s Wall time in d8).

danleh · 2025-07-30T13:07:04Z

JetStreamDriver.js

+        },
+        iterations: 15,
+        worstCaseCount: 2,
+        tags: ["Wasm"],


Given #103, probably makes sense to add tag "disabled" for complex, and "default" for todomvc.

mkustermann · 2025-07-30T14:26:44Z

In general I'm definitely in favor of including TodoMVC benchmark.

These two flutter-based benchmark are just very different:

TodoMVC is mostly static, a new frame will add a new Todo item to a list but the rest of the page stays the same. Complex instead is re-building the entire page on each frame with modified widget positions, which will cause re-build, re-layout, re-draw of everything (which is why more time is spent in e.g. RenderBox.layout etc)
=> For real apps a re-building, re-layout & re-draw of entire screen would happen e.g. on first frame, if window is resized, navigation to new screen or animations change sizing of everything on screen
=> The complex benchmark is doing this on every frame when in real apps it happens only occasionally
TodoMVC is simpler page, it has less widgets and simpler widgets being displayed. The Complex benchmark uses many more widgets. Especially the CupertinoTimePicker widget causes a lot of CPU time to be spent, lots of objects to be allocated etc (arguably it's implementation is sub-optimal -- but it's code that real flutter apps out there use)

In some sense one could view

TodoMVC as exercising "average frame" workload (i.e. update small part of screen)
Complex benchmark exercising "worst frame" workload (i.e. rebuild entire screen from scratch)

So if one optimizes for Complex then one will in real apps optimize the worst-case/longest frames (which arguably is important - as the longest frames can cause UI jank).

What are the policies around benchmark inclusion into JetStream? Is there a concept of "experimental" workload where the bar for inclusion is lower (like in WebKit/Speedometer/experimental)? Is it a problem to keep both?

(If it's helpful, I'd actually be happy to supply more workloads. Recently I got dart2wasm to be able to compile itself working - which may also be quite interesting (and there's big differences in CPU time and RAM consumption of D8 vs JSC vs JSShell. We also have a more real world larger flutter UI app that has an automated benchmark of a few screens that could be interesting - but I'd need to see if it can be made to run with flute framework instead of real flutter)

eqrion · 2025-07-30T19:55:49Z

In some sense one could view
* TodoMVC as exercising "average frame" workload (i.e. update small part of screen)

* Complex benchmark exercising "worst frame" workload (i.e. rebuild entire screen from scratch)
So if one optimizes for Complex then one will in real apps optimize the worst-case/longest frames (which arguably is important - as the longest frames can cause UI jank).

I'm not opposed to trying to measure 'full frame redraw' workloads, but I'm concerned that 'complex' is pathological in a way that's much worse than we could realistically expect. From the first comment, I was seeing ~40x more bytes allocated per-frame than TodoMVC, which itself allocated 3x more bytes than the wondrous.app (which was the easiest thing I could get running by myself). And of all the bytes in 'complex', over half were coming from that time picker.

Is there a small modification you could make to TodoMVC to have it do some more full frame redraws? Maybe resize something for a couple frames?

What are the policies around benchmark inclusion into JetStream? Is there a concept of "experimental" workload where the bar for inclusion is lower (like in WebKit/Speedometer/experimental)? Is it a problem to keep both?

I don't think it's a problem to keep both if we apply the 'disabled' tag. That will keep it tested, but it won't be in the main final score.

(If it's helpful, I'd actually be happy to supply more workloads. Recently I got dart2wasm to be able to compile itself working - which may also be quite interesting (and there's big differences in CPU time and RAM consumption of D8 vs JSC vs JSShell. We also have a more real world larger flutter UI app that has an automated benchmark of a few screens that could be interesting - but I'd need to see if it can be made to run with flute framework instead of real flutter)

I'm happy to see more workloads, but I think for the main final score of the benchmark we need to balance things to not overly weight one particular language or workload. DotNet got two subtests, so I could see an argument for having two Dart subtests. But I don't want the other subtest to be 'complex' right now for the reasons stated above.

mkustermann · 2025-07-31T13:10:42Z

but I'm concerned that 'complex' is pathological in a way that's much worse than we could realistically expect.

That's a very fair concern and is due to this inefficient CupertinoTimePicker implementation. I can try using a different widget instead of that one in Complex benchmark and see how that influences the profile & allocation rate (**)

Is there a small modification you could make to TodoMVC to have it do some more full frame redraws? Maybe resize something for a couple frames?

I'll try to make the (quickly made) TodoMVC example do more work. Amongst other things, I'd like it to scroll down the list when adding new items - as scrolling is an important thing for apps (**)

I'm happy to see more workloads, but I think for the main final score of the benchmark we need to balance things to not overly weight one particular language or workload.

That makes sense. Happy to land some of those in disabled state or make an aggregate number out of sub-benchmarks.

If all the vendors prefer TodoMVC over Complex atm, then how about we leave both in and make Complex disabled for now?

(**) I'll be OOO for a few more weeks over the summer, so it may only happen afterwards.

(Sidenote: Consider waiting until #95 is finished or update this PR to include latest https://github.com/mkustermann/wasm_gc_benchmarks - to avoid updating the binaries twice)

danleh reviewed Jul 30, 2025

View reviewed changes

danleh requested a review from kmiller68 July 30, 2025 13:28

danleh mentioned this pull request Aug 4, 2025

Remove performance.mark and performance.measure etc. from Dart-flute-wasm code #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace Dart-flute-complex with Dart-flute-todomvc #97

Replace Dart-flute-complex with Dart-flute-todomvc #97

Uh oh!

eqrion commented Jul 23, 2025

Uh oh!

netlify bot commented Jul 23, 2025 •

edited

Loading

Uh oh!

danleh left a comment

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

danleh Jul 30, 2025

Uh oh!

mkustermann commented Jul 30, 2025

Uh oh!

eqrion commented Jul 30, 2025

Uh oh!

mkustermann commented Jul 31, 2025

Uh oh!

Uh oh!

Replace Dart-flute-complex with Dart-flute-todomvc #97

Are you sure you want to change the base?

Replace Dart-flute-complex with Dart-flute-todomvc #97

Uh oh!

Conversation

eqrion commented Jul 23, 2025

Uh oh!

netlify bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for webkit-jetstream-preview ready!

Uh oh!

danleh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkustermann commented Jul 30, 2025

Uh oh!

eqrion commented Jul 30, 2025

Uh oh!

mkustermann commented Jul 31, 2025

Uh oh!

Uh oh!

netlify bot commented Jul 23, 2025 •

edited

Loading