-
Notifications
You must be signed in to change notification settings - Fork 311
pkg/list: Sort is too slow #745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Original reply by @mpvl in cuelang/cue#745 (comment) Is this something you found in practice or just when playing around? We are aware of slowness in various places, but we tend to prioritize actual use cases. |
Original reply by @vikstrous2 in cuelang/cue#745 (comment) The use case is outputting a predictable yaml output when generating kubernetes configs so that we can diff the produced changes on PRs. It turned out that a few simple sorts are actually not enough for this anyway, so I'm using this yq command to clean the output of cue and make it git diffable: |
Original reply by @mpvl in cuelang/cue#745 (comment) Would it be useful to have a semantic diff? There is an internal diff package in the CUE, but we could expose it as a package and or command. If the latter, you could comment on Issue #8 on how you think this would best look like. |
Original reply by @vikstrous2 in cuelang/cue#745 (comment) I'll comment on #8 because that's closer to my use case. |
Update: We've been running the above yq command on the output of CUE for a while now, but there are ~500 files to sort and now it takes longer to run yq so many times than it takes to generate the output in the first place. If we had associative lists and those were output in a sorted order, that would solve this issue. I tried to make my own associative lists with a for comprehension. I see that maps are output in a sorted order, but for comprehensions ( To be fair, I haven't tried applying the associative list + sort pattern everywhere because I'm worried that I'll run into the same performance issue after refactoring my config. |
@vikstrous2 do you have an example where the output is not stable? (https://github.com/cue-lang/cue/wiki/Creating-test-or-performance-reproducers) CUE uses a topological sort due to conjunctions and such. In my experience it is stable with both JSON and Yaml output of kubernetes manifests, though not lexicographically sorted. You might also try v0.4.0 but this has been the behavior in the 0.3 series as well (iirc) |
Sorry for the confusion. The issue is not that the output is not stable between runs with the same config. The issue is that refactors cause fields to be reordered even though it's rendering equivalent configs. I realized now that actually maps are output in the same order the are defined. The following two configs should produce the same output because the are equivalent:
The fact that they produce different output makes it infeasible to review the diff in the generated output when refactoring. (when working with large configs where there might be 1000 changes that are just noise) |
@vikstrous2 thanks for the additional detail. As noted in #474 we are looking to tighten up this behaviour in the spec, so that should help.
Indeed, that should be the case above, despite the fact this is not guaranteed in the spec. So given this is the case, are you running into a case where this is not happening? i.e. why are you finding that sort is required? Without a sort the output might well differ from the sorted result, but if you accept that diff then from that point on the diff should be "true", no? Note that I'm not pushing back on the fact that we should fix the speed of sort; that we should fix it is a given (and thank you for the reproducer above). Rather, I'm trying to see if there is a practical workaround for now. It sounds like you have CUE as the source of truth - do you even need to review the diff of the yaml output? I ask that question, again not to undermine the need for a semantic diff package, but to understand more about your use case. |
For ease of reproduction, here is a
which with 0f53054 gives:
|
Reasonable questions. Thanks for asking them.
No, I'm not running into cases where the output order is unexpected. The order just changes between refactors because fields are often moved around. Sometimes there's no way to do a refactor without moving fields around and it's hard / tedious to split it up into two refactors where one of them keeps the order.
Yes, we can accept the diff, of course. But we'd have to do it blindly unless we go through it and manually compare every changed line to make sure nothing unexpected sneaks through the review. This is hard if the diff is 1000 lines, but only 1 meaningful change is expected.
I actually think that if the output of maps was sorted and that also led to for comprehensions being sorted, that might be a better solution than having to manually sort the output. I don't know if this is feasible or if it's a complete solution (there might be a lot of for comprehensions that interact with each other and might break sorting anyway).
Definitely. It acts as a test that can be verified easily. A lot of engineers are not proficient at writing CUE and it's easy to accidentally make mistakes that can be caught when reviewing the generated output. We don't render every possible config (we generate different things for different environments), but we render one example config and check that in so it's part of the PR review. I commented on the semantic diff issue, but I'm thinking that it doesn't provide significantly more value than diffing well sorted and normalized yaml files. |
This is definitely an option, but I don't think it's necessarily the right default.
A very good point.
Generally speaking I think diff will be more useful, because it is a semantic diff across different encodings. But I can see how in your case a more specific diff would be more useful. Thanks very much for the additional colour. |
There is some relation here to #1180, in so far as we probably need to provide more import/export options. |
The reimplemenation replaces uses the high-level API usage with using the low-level, making use of the fact that each Vertex used for comparison uses the same outline and the same conjuncts for "less". This significantly reduces the number of allocations and other redundant work. Before: Benchmark/sort.txtar-12 5 202090158 ns/op After: Benchmark/sort.txtar-12 28 43326277 ns/op Which is about an 80% reduction in running time. It is possible to reduce runtime further if the type indicated for x and y are idempotent. There are several strategies to analyze this. A TODO has been added for this. This change achieves the lion share of possible performance improvements for Sort. We leave it open, though, to verify the result with the various stakeholders. Issue #745 Signed-off-by: Marcel van Lohuizen <[email protected]> Change-Id: Idd6ac1925f5dd786313ea450218d4d17eb590581 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/549087 Reviewed-by: Daniel Martí <[email protected]> Reviewed-by: Roger Peppe <[email protected]> Unity-Result: CUEcueckoo <[email protected]> TryBot-Result: CUEcueckoo <[email protected]>
list.Sort should now be significantly faster. There are still significant possible optimizations if the values for T (or x and y) are idempotent. But I hope for now this will be of help. |
Issue #745 Signed-off-by: Marcel van Lohuizen <[email protected]> Change-Id: I6ad8ed5f59c8709f276da8a46858ab4fb9423c2e Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/549086 Reviewed-by: Daniel Martí <[email protected]> Reviewed-by: Roger Peppe <[email protected]> Unity-Result: CUEcueckoo <[email protected]> TryBot-Result: CUEcueckoo <[email protected]>
Dropping the now-defunct v0.4.x milestone from this issue, but leaving the |
Here is a tweaked testscript which avoids using
So evalv3 is about twice as slow as evalv2. And both evaluators are about three times as slow when adding the sorting. Both of these should be resolved, even though the slow-downs aren't incredibly worrying at this point. |
Originally opened by @vikstrous2 in cuelang/cue#745
What version of CUE are you using (
cue version
)?Does this issue reproduce with the latest release?
yes
What did you do?
Baseline:
root.cue
Run:
Replace the last line with this list.Sort command to see the performance impact of using list.Sort in this config:
Run:
What did you expect to see?
negligible performance difference
What did you see instead?
10x slow down
The text was updated successfully, but these errors were encountered: