merge main into amd-staging #436

z1-cciauto · 2025-10-30T15:00:43Z

No description provided.

…n.cpp (NFC)

…lvm#162819) This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline. Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag to enable/disable the pass. see the PR:llvm#116953

This patch addresses two use-after-move issues: 1. `Timing.cpp` A variable was std::moved and then immediately passed to an `assert()` check. Since the moved-from state made the assertion condition trivially true, the check was effectively useless. The `assert()` is removed. 2. `Query.cpp` The `matcher` object was moved-from and then subsequently used as if it still retained valid state. The fix ensures no subsequent use for the moved-from variable. Testing: `ninja check-mlir`

In `TextDiagnostic.cpp`, we're using column- and byte indices everywhere, but we were using integers for them which made it hard to know what to pass where, and what was produced. To make matters worse, that `SourceManager` considers a "column" is a byte in `TextDiagnostic`. Add `Bytes` and `Columns` structs, which are not related so API using them can differentiate between values interpreted columns or bytes.

….py module (llvm#165535) This commit extracts some MIR-related code from `common.py` and `update_mir_test_checks.py` into a dedicated `mir.py` module to improve code organization. This is a preparation step for llvm#164965 and also moves some pieces already moved by llvm#140296 All code intentionally moved verbatim with minimal necessary adaptations: * `log()` calls converted to `print(..., file=sys.stderr)` at `mir.py` lines 62, 64 due to a `log` locality.

…. NFC

Follow on from llvm#164372 This changes the DW_AT_name for `_BitInt(N)` from `_BitInt` to `_BitInt(N)`

…vm#165527) Allow the stack move optimization (which merges two allocas) when the address of only one alloca is captured (and the provenance is not captured). Both addresses need to be captured to observe that the allocas were merged. Fixes llvm#165484.

) This documents two things: * The recommended way to go about adding a new pass. * The criteria for enabling a pass. RFC: https://discourse.llvm.org/t/rfc-guidelines-for-adding-enabling-new-passes/88290

We've upgraded to LLVM 22 now, so we can remove a bunch of TODOs.

MemoryAccess base class was included from Core.h when it was a subclass of ExecutorProcessControl, but this changed in 0faa181

Also rename map to Map, remove the m_ prefix from member variables and fix the naming of the existing color variables.

Initial parsing/sema/codegen support for threadset clause in task and taskloop directives [Section 14.8 in in OpenMP 6.0 spec] ---------

…ency." (llvm#165688) Reverts llvm#165496 Due to flaky failures on Arm 32-bit since this change. Detailed in llvm#165496 (comment).

Currently all `runInTerminal` test are skipped in debug builds because, when attaching it times out parsing the debug symbols of lldb-dap. Add this test since it is running in teminal.

….reduce intrinsics. (llvm#165400) This is the first step in removing some NEON reduction intrinsics that duplicate the behaviour of their llvm.vector.reduce counterpart. NOTE: The i8/i16 variants differ in that the NEON versions return an i32 result. However, this looks more about making their code generation convenient with SelectionDAG disgarding the extra bits. This is only relevant for the next phase because the Clang usage always truncate their result, making llvm.vector.reduce a drop in replacement.

…lvm#164246) This patch adds test cases that demonstrate missing dependencies in DA caused by the lack of overflow handling. These issues will be addressed by properly inserting overflow checks and bailing out when one is detected. It covers the following dependence test functions: - Strong SIV - Weak-Crossing SIV - Weak-Zero SIV - Symbolic RDIV - GCD MIV It does NOT cover: - Exact SIV - Exact RDIV - Banerjee MIV

Pulled out of the abandoned patch llvm#69710 to act as a baseline for llvm#165694

@Michael137

It looks like the documentation for `llvm-cxxfilt`'s `--[no-]strip-underscore` options weren't updated when llvm#106233 was made. CC @Michael137 (I don't have merge rights myself).

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

…gers (llvm#165540) This patch allows us to narrow single bit-test/twiddle operations for larger than legal scalar integers to efficiently operate just on the i32 sub-integer block actually affected. The BITOP(X,SHL(1,IDX)) patterns are split, with the IDX used to access the specific i32 block as well as specific bit within that block. BT comparisons are relatively simple, and builds on the truncated shifted loads fold from llvm#165266. BTC/BTR/BTS bit twiddling patterns need to match the entire RMW pattern to safely confirm only one block is affected, but a similar approach is taken and creates codegen that should allow us to further merge with matching BT opcodes in a future patch (see llvm#165291). The resulting codegen is notably more efficient than the heavily micro-coded memory folded variants of BT/BTC/BTR/BTS. There is still some work to improve the bit insert 'init' patterns included in bittest-big-integer.ll but I'm expecting this to be a straightforward future extension. Fixes llvm#164225

…4217) This reverts commit 78bf682. Original PR: llvm#157463 Revert PR: llvm#158566 The relevant buildbots have been updated to a ROCm version that does not use the macros anymore to avoid the failures. Implements SWDEV-522062.

Not sure if this warrants a PR, but I realized there was a typo in a test filename from my previous PR llvm#164387.

… AVX targets (llvm#165676) If the PTEST is just using the ZF result and one of the operands is a i32/i64 sign mask we can use the TESTPD/PS instructions instead and avoid the use of an extra constant. Fixes some codegen identified in llvm#156233

…sm (llvm#149308) First batch of changes to add support for inline-asm callbr for the AMDGPU backend.

) A collection of small changes to get a number of lit tests working on z/OS.

…s" (llvm#165717) Reverts llvm#165281 Because our Windows on Arm buildbot is red all over: https://lab.llvm.org/buildbot/#/builders/141/builds/12624

To improve debuggability, the macro arguments should be resolved to their original location rather than macro expansion location. [PR in cation](https://github.com/user-attachments/assets/994fb89f-83be-4c21-a79c-f8e51d818f7b) fixes llvm#160667

…llvm#164978) Use LocalAliasAnalysis to improve handling of side effects in nested scf.parallel. If the written memory outside nested scf.parallel is not alias to the memory accessed inside the nested loop, we can convert it to gpu.launch.

…lvm#159573) When a load/store is conditionally executed in a loop it isn't a candidate for pre/post-index addressing, as the increment of the address would only happen on those loop iterations where the load/store is executed. Detect this and only discount the AddRec cost when the load/store is unconditional.

Identified with misc-unused-using-decls.

*getInstrInfo() is already of type const HexagonInstrInfo &.

Unused loop invariant loads were not sunk from the preheader to the exit block, increasing live range. This commit moves the sinkUnusedInvariant logic from indvarsimplify to LICM also adds functionality to sink unused load that's not clobbered by the loop body.

This reverts commit 57722dd. This caused some MacOS test failures due to resource there having issues with RLIMIT_STACK. The underlying syscall fails with EINVAL despite the values being correct. For now, move this to the non Darwin test.

This was supposed to be in 6ccd1e8 but got left out because I forgot to save the file inside of VSCode. This was causing test failures on MacOS due to the previously mentioned failures setting ulimit that caused the patch to be reverted in the first place. https://lab.llvm.org/buildbot/#/builders/190/builds/29990

…m#165272) Followup metadata for remainder loops is handled by two implementations, both added by 7244852: 1. `tryToUnrollLoop` in `LoopUnrollPass.cpp`. 2. `CloneLoopBlocks` in `LoopUnrollRuntime.cpp`. As far as I can tell, 2 is useless: I added `assert(!NewLoopID)` for the `NewLoopID` returned by the `makeFollowupLoopID` call, and it never fails throughout check-all for my build. Moreover, if 2 were useful, it appears it would have a bug caused by 7cd826a. That commit skips adding loop metadata to a new remainder loop if the remainder loop itself is to be completely unrolled because it will then no longer be a loop. However, that commit incorrectly assumes that `UnrollRemainder` dictates complete unrolling of a remainder loop, and thus it skips adding loop metadata even if the remainder loop will be only partially unrolled. To avoid further confusion here, this patch removes 2. check-all continues to pass for my build. If 2 actually is useful, please advise so we can create a test that covers that usage. Near 2, this patch retains the `UnrollRemainder` guard on the `setLoopAlreadyUnrolled` call, which adds `llvm.loop.unroll.disable` to the remainder loop. That behavior exists both before and after 7cd826a. The logic appears to be that remainder loop unrolling (whether complete or partial) is opt-in. That is, unless `UnrollRemainder` is true, `UnrollRuntimeLoopRemainder` skips running remainder loop unrolling, and `llvm.loop.unroll.disable` suppresses any later attempt at it. This patch also extends testing of remainder loop followup metadata to be sure remainder loop partial unrolling is handled correctly by 1.

z1-cciauto · 2025-10-30T15:01:18Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2556

…edundant-sgpr-vgpr-rw-across-bb.ll

z1-cciauto · 2025-10-30T16:18:28Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2562

joker-eph and others added 30 commits October 29, 2025 23:48

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in Vectorizatio…

e0d9c9c

…n.cpp (NFC)

[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (l…

4d7093b

…lvm#162819) This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline. Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag to enable/disable the pass. see the PR:llvm#116953

[clang] Update C++ DR status page

e760542

[AArch64][GlobalISel] Add some GISel test coverage for icmp-and tests…

31890c5

…. NFC

[DebugInfo] Add bit size to _BitInt name in debug info (llvm#165583)

30579c0

Follow on from llvm#164372 This changes the DW_AT_name for `_BitInt(N)` from `_BitInt` to `_BitInt(N)`

[DeveloperPolicy] Add guidelines for adding/enabling passes (llvm#158591

43ea75d

) This documents two things: * The recommended way to go about adding a new pass. * The criteria for enabling a pass. RFC: https://discourse.llvm.org/t/rfc-guidelines-for-adding-enabling-new-passes/88290

[libc++] Fix LLVM 22 TODOs (llvm#153367)

bb1158f

We've upgraded to LLVM 22 now, so we can remove a bunch of TODOs.

[GVN] Add tests for pointer replacement with different addr size (NFC)

689e95c

[AMDGPU] insert eof white space (llvm#165673)

eccbfde

[ORC] Fix missing include for MemoryAccess interface (NFC) (llvm#165576)

932fa0e

MemoryAccess base class was included from Core.h when it was a subclass of ExecutorProcessControl, but this changed in 0faa181

[clang][NFC] Make ellipse strings constexpr (llvm#165680)

96feee4

Also rename map to Map, remove the m_ prefix from member variables and fix the naming of the existing color variables.

[clang][OpenMP] New OpenMP 6.0 threadset clause (llvm#135807)

25ece5b

Initial parsing/sema/codegen support for threadset clause in task and taskloop directives [Section 14.8 in in OpenMP 6.0 spec] ---------

Revert "[lldb-dap] Improving consistency of tests by removing concurr…

f205be0

…ency." (llvm#165688) Reverts llvm#165496 Due to flaky failures on Arm 32-bit since this change. Detailed in llvm#165496 (comment).

[lldb-dap][test] skip io_redirection in debug builds (llvm#165593)

838f643

Currently all `runInTerminal` test are skipped in debug builds because, when attaching it times out parsing the debug symbols of lldb-dap. Add this test since it is running in teminal.

[LoongArch][NFC] Pre-commit tests for vector type average (llvm#161076)

84fc780

[X86] Add ldexp test coverage for avx512 targets (llvm#165698)

3b30010

Pulled out of the abandoned patch llvm#69710 to act as a baseline for llvm#165694

[llvm-cxxfilt] update docs to reflect llvm#106233 (llvm#165709)

a8656c5

It looks like the documentation for `llvm-cxxfilt`'s `--[no-]strip-underscore` options weren't updated when llvm#106233 was made. CC @Michael137 (I don't have merge rights myself).

[X86] combinePTESTCC - ensure repeated operands are frozen (llvm#165697)

8c8bead

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

[CIR] Upstream handling for __builtin_prefetch (Typo Fix) (llvm#165209)

5c5cef3

Not sure if this warrants a PR, but I realized there was a typo in a test filename from my previous PR llvm#164387.

[AMDGPU][FixIrreducible][UnifyLoopExits] Support callbr with inline-a…

8954011

…sm (llvm#149308) First batch of changes to add support for inline-asm callbr for the AMDGPU backend.

bunch of small changes to fix a number of LIT tests on z/OS (llvm#165567

6106b94

) A collection of small changes to get a number of lit tests working on z/OS.

DavidSpickett and others added 11 commits October 30, 2025 13:35

Revert "[LLDB][Windows]: Don't pass duplicate HANDLEs to CreateProces…

9423d59

…s" (llvm#165717) Reverts llvm#165281 Because our Windows on Arm buildbot is red all over: https://lab.llvm.org/buildbot/#/builders/141/builds/12624

[mlir] Remove unused "using" decls (NFC) (llvm#165652)

c56fdf9

Identified with misc-unused-using-decls.

[Hexagon] Remove a redundant cast (NFC) (llvm#165654)

8e6ef2d

*getInstrInfo() is already of type const HexagonInstrInfo &.

Reapply "[lit] Support more ulimit options"

6ccd1e8

This reverts commit 57722dd. This caused some MacOS test failures due to resource there having issues with RLIMIT_STACK. The underlying syscall fails with EINVAL despite the values being correct. For now, move this to the non Darwin test.

merge main into amd-staging

78f9db8

z1-cciauto requested review from Groverkss and nicolasvasilache as code owners October 30, 2025 15:00

z1-cciauto requested a review from a team October 30, 2025 15:00

Xfail: downstream test:readlane_first llvm/test/CodeGen/AMDGPU/fold-r…

4d0d090

…edundant-sgpr-vgpr-rw-across-bb.ll

ronlieb approved these changes Oct 30, 2025

View reviewed changes

ronlieb merged commit e4e8c79 into amd-staging Oct 30, 2025
13 checks passed

ronlieb deleted the upstream_merge_202510301100 branch October 30, 2025 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #436

merge main into amd-staging #436

Uh oh!

z1-cciauto commented Oct 30, 2025

Uh oh!

z1-cciauto commented Oct 30, 2025

Uh oh!

z1-cciauto commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

33 participants

merge main into amd-staging #436

merge main into amd-staging #436

Uh oh!

Conversation

z1-cciauto commented Oct 30, 2025

Uh oh!

z1-cciauto commented Oct 30, 2025

Uh oh!

z1-cciauto commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

33 participants