-
Notifications
You must be signed in to change notification settings - Fork 15
Replace Dart-flute-complex with Dart-flute-todomvc #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Update to the latest wasm_gc_benchmarks commit, and pull in the todomvc benchmark too. I had to update the benchmark runner polyfill. I just copy pasted things from tools/run_wasm.js until it worked for me. This part likely needs some filtering and clean up.
✅ Deploy Preview for webkit-jetstream-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, changing to todomvc as the Dart workload generally sounds fine to me (but it would be great if @mkustermann can chime in as the Dart expert).
In terms of overall CPU profile, we see on d8 (complex vs. todomvc, where I already increased the iterations for todomvc to 30, see comment in the driver):
- Roughly the same amount of samples spent in compiling code (2.2B vs. 2.1B), so given the shorter overall runtime, compilation accounts for a higher proportion of the samples in todomvc (5% vs. 11%).
- todomvc is certainly much less GC heavy, both in absolute (8.8B vs. 2.1B samples) and relative (20% vs. 11% of all samples) terms.
- In terms of the bottom up profile, they seem to be roughly similar, i.e., both contain similar high-self time functions, e.g.,
_getMasqueradedRuntimeType
,_HashSet.add
,MultiChildRenderObjectElement.update
,ComponentElement.performRebuild
,_CompressedNode.get
,SingleChildRenderObjectElement.update
,Element.dependOnInheritedElement
,_checkInstance (polymorphic dispatcher)
. The heaviest functions that were in complex, but are missing from the todomvc profile areBoxConstraints.==
,RenderBox.layout
,RenderShiftedBox.paint
@mkustermann can you comment whether that's realistic/fine/intended?
Also some smaller nits and fixes to make it run in d8, see detailed comments.
try { | ||
await action(); | ||
} catch (e) { | ||
// JSC doesn't report/print uncaught async exceptions for some reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmiller68 Is this something to fix on JSC side?
var id = timerIdCounter++; | ||
// A callback can be scheduled at most once. | ||
// (console.assert is only available on D8) | ||
if (isD8) console.assert(f.$timerId === undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While console.assert
is available in d8, we overwrite/polyfill the console
object in
Line 27 in 36e8a98
console = { |
self.dartUseDateNowForTicks = true; | ||
|
||
function dartPrint(...args) { console.log(args); } | ||
// globalThis.window ??= globalThis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line needs to be uncommented for d8.
|
||
// Signals `Stopwatch._initTicker` to use `Date.now` to get ticks instead of | ||
// `performance.now`, as it's not available in d8. | ||
self.dartUseDateNowForTicks = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not required, d8 has performance.now
since a number of years: https://crsrc.org/c/v8/src/d8/d8.cc;drc=cb2b1a2ff084490f114e27a84b2a9a439af40ded;l=2180
@@ -19,7 +19,10 @@ mkdir -p build/ | tee -a "$BUILD_LOG" | |||
# Generic Dart2wasm runner. | |||
cp wasm_gc_benchmarks/tools/run_wasm.js build/ | tee -a "$BUILD_LOG" | |||
# "Flute Complex" benchmark application. | |||
cp wasm_gc_benchmarks/benchmarks-out/flute.dart2wasm.{mjs,wasm} build/ | tee -a "$BUILD_LOG" | |||
# cp wasm_gc_benchmarks/benchmarks-out/flute.complex.dart2wasm.mjs build/flute.dart2wasm.mjs | tee -a "$BUILD_LOG" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove commented out lines
@@ -1272,7 +1272,7 @@ class WSLBenchmark extends Benchmark { | |||
benchmark.buildStdlib(); | |||
results.push(performance.now() - start); | |||
|
|||
performance.measure(markLabel, markLabel); | |||
// performance.measure(markLabel, markLabel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -1283,7 +1283,7 @@ class WSLBenchmark extends Benchmark { | |||
benchmark.run(); | |||
results.push(performance.now() - start); | |||
|
|||
performance.measure(markLabel, markLabel); | |||
// performance.measure(markLabel, markLabel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
// `performance.now`, as it's not available in d8. | ||
self.dartUseDateNowForTicks = true; | ||
|
||
function dartPrint(...args) { console.log(args); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print
instead of console.log
here hides the Dart console output, which I think makes sense.
], | ||
preload: { | ||
jsModule: "./Dart/build/flute.todomvc.dart2wasm.mjs", | ||
wasmBinary: "./Dart/build/flute.todomvc.dart2wasm.wasm", | ||
}, | ||
iterations: 15, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the shorter runtime of todomvc, I'd be happy to increase iterations to 30 (or so), to have a bit less noisy measurement (still <5s Wall time in d8).
}, | ||
iterations: 15, | ||
worstCaseCount: 2, | ||
tags: ["Wasm"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given #103, probably makes sense to add tag "disabled"
for complex, and "default"
for todomvc.
In general I'm definitely in favor of including TodoMVC benchmark. These two flutter-based benchmark are just very different:
In some sense one could view
So if one optimizes for Complex then one will in real apps optimize the worst-case/longest frames (which arguably is important - as the longest frames can cause UI jank). What are the policies around benchmark inclusion into JetStream? Is there a concept of "experimental" workload where the bar for inclusion is lower (like in WebKit/Speedometer/experimental)? Is it a problem to keep both? (If it's helpful, I'd actually be happy to supply more workloads. Recently I got dart2wasm to be able to compile itself working - which may also be quite interesting (and there's big differences in CPU time and RAM consumption of D8 vs JSC vs JSShell. We also have a more real world larger flutter UI app that has an automated benchmark of a few screens that could be interesting - but I'd need to see if it can be made to run with flute framework instead of real flutter) |
I'm not opposed to trying to measure 'full frame redraw' workloads, but I'm concerned that 'complex' is pathological in a way that's much worse than we could realistically expect. From the first comment, I was seeing ~40x more bytes allocated per-frame than TodoMVC, which itself allocated 3x more bytes than the wondrous.app (which was the easiest thing I could get running by myself). And of all the bytes in 'complex', over half were coming from that time picker. Is there a small modification you could make to TodoMVC to have it do some more full frame redraws? Maybe resize something for a couple frames?
I don't think it's a problem to keep both if we apply the 'disabled' tag. That will keep it tested, but it won't be in the main final score.
I'm happy to see more workloads, but I think for the main final score of the benchmark we need to balance things to not overly weight one particular language or workload. DotNet got two subtests, so I could see an argument for having two Dart subtests. But I don't want the other subtest to be 'complex' right now for the reasons stated above. |
That's a very fair concern and is due to this inefficient
I'll try to make the (quickly made) TodoMVC example do more work. Amongst other things, I'd like it to scroll down the list when adding new items - as scrolling is an important thing for apps (**)
That makes sense. Happy to land some of those in If all the vendors prefer TodoMVC over Complex atm, then how about we leave both in and make Complex (**) I'll be OOO for a few more weeks over the summer, so it may only happen afterwards. (Sidenote: Consider waiting until #95 is finished or update this PR to include latest https://github.com/mkustermann/wasm_gc_benchmarks - to avoid updating the binaries twice) |
This is the WIP commit I've been using to compare the behavior of Dart-flute-todomvc and Dart-flute-complex. I also built a local version of https://wonderous.app/web/ using Wasm-GC to compare as well.
From profiling I observed the following relative rates of allocation in bytes per frame:
This is from a profiler that samples 1% of allocations and reports the stack, bytes, etc for them. So I have good relative numbers, but not precise absolute numbers.
In summary, complex allocates about 39x more bytes than todomvc, which allocates about 3x more bytes than wondrous.app.
In addition, over half of the allocated bytes in complex are coming from stacks with
CupertinoTimePicker
in their name. That seems excessive to me.I would prefer if todomvc replaces complex, as it seems more representative of realistic workloads. This PR includes both for if anyone else wants to do comparisons on them, but I would remove the complex benchmark if we go ahead with this.
Here are dartpad links for complex/todomvc if anyone wants to look at the rendered output of either.
cc' @danleh @camillobruni @kmiller68 @mkustermann