-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Add LLVM range attributes to slice length parameters #148350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
rustbot has assigned @JonathanBrouwer. Use |
| pub fn len_plus_ten_b(s: &[u32]) -> usize { | ||
| // CHECK: start: | ||
| // CHECK-NOT: add | ||
| // CHECK: %[[R:.+]] = add nuw nsw i{{.+}} %s.1, 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is the simplest demonstration of LLVM taking advantage of it. See https://rust.godbolt.org/z/7jxqsaaTx showing that it didn't know nuw/nsw for these things previously.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add LLVM range attributes to slice length parameters
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (398e28a): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary -1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.1%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 475.716s -> 474.193s (-0.32%) |
|
The size numbers are really interesting here. 112KB off I think instruction-wide it's close enough to neutral to be tolerable. Especially since the bootstrap time is green. |
|
Note that the compiler toolchain size sometimes changes e.g. by tens of KiB even on no-op/README changes :( Probably due to BOLT/PGO. |
|
I've reviewed the code and it is correct, the tests also look good. I'm however struggling to interpret the performance results. From my experience the bootstrap time & toolchain size are so noisy that I don't even look at them. The benchmarks are a bit more red than I like (and expect from this change) tho, I don't really understand why. Especially (To other reviewers, feel free to r=me if you can explain the perf) |
This comment has been minimized.
This comment has been minimized.
|
Do we need to make a new build? |
Add LLVM range attributes to slice length parameters
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (2d5ecaf): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 7.6%, secondary -3.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.8%, secondary 0.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 476.797s -> 476.639s (-0.03%) |
|
Ok so it is reproducible, good to know :) Still very confusing perf tho. I think that both improvements in the primary benchmarks may actually be noise, looking at the history graph. I think the perf is probably good enough to merge regardless? I'm not sure though, re-assigning to someone else to make the final decision r? compiler r=me,you if you agree performance is acceptable, I already reviewed the code and that looks good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the perf results are okay, the primary benchmarks basically just see small improvements and its only really secondary benchmarks that have small regressions. More information is typically good and could be used more by LLVM in future, so I'd prefer to output this.
|
@bors r+ |
|
@bors r=JonathanBrouwer,davidtwco rollup=never |
|
💡 This pull request was already approved, no need to approve it again.
|
|
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 1f880d9 (parent) -> 20383c9 (this PR) Test differencesShow 4 test diffsStage 1
Stage 2
Additionally, 2 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 20383c9f1d84eb9b9c6668a1668ef68a81eae274 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (20383c9): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.1%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 474.478s -> 472.257s (-0.47%) |
It'll take a bunch more work to do this in layout -- because of cycles in
struct Foo<'a>(&'a Foo<'a>);-- so until we figure out how to do that well, just look for slices specifically and add the proper range for the length.