merge main into amd-staging #417

z1-cciauto · 2025-10-28T19:07:28Z

No description provided.

The pass calls setIgnored() on functions in parallel, but setIgnored is not thread safe. This patch adds a std::mutex to guard setIgnored calls. Fixes: llvm#165362

@foo

Added RecursiveMemoryEffects to ExecuteRegionOp to be aligned to other ops with region and get appropriate support in all appropriate passes, which need RecursiveMemoryEffects. The added test in dealloc-memoryeffect-interface.mlir fails with error 'ops with unknown memory side effects are not supported' without RecursiveMemoryEffects. The updated test in one-shot-module-bufferize.mlir gets cleaned by DCE once the interface is added. Added func.call @foo():()->() which has effect to keep execute_region from being removed. --------- Co-authored-by: Mehdi Amini <[email protected]>

b6bbc4b fixed IRBuilder::CreatePtrToAddr to produce the correct instruction. Update the test for ptr_diff lowering accordingly.

…tializers (llvm#163401) Currently we receive a warning when initializing a ThreadEventCallbacks when compiling with this flag: ``` llvm-project/compiler-rt/lib/tsan/rtl/tsan_platform_mac.cpp:252:3: warning: missing field 'start' initializer [-Wmissing-designated-field-initializers] 252 | }; | ^ ``` This patch explicitly initializes the missing fields to null, fixing the warning. rdar://162074310

This fold can be directly reused for ptrtoaddr. One caveat is that for an inttoptr base, it currently won't work for pointers with non-address bits. It's possible to support this case.

…re help content (llvm#162029) Fix clangd/clangd#2513 This regression was introduced with llvm#140498. The issue is that with llvm#140498 the extraction of the documentation comment changed from line based to paragraph based. This also removed some required line breaks inside paragraphs, which used to be added before the change. This PR adds the missing line breaks again.

…65266) As reported on llvm#164853 - we only attempt to reduce shifted loads for constant shift amounts, but we could do more with non-constant values if value tracking can confirm basic alignments. This patch determines if a truncated shifted load of scalar integer shifts by a byte aligned amount and replaces the non-constant shift amount with a pointer offset instead. I had hoped to make this a generic DAG fold, but reduceLoadWidth isn't ready to be converted to a KnownBits value tracking mechanism, and other targets don't have complex address math like X86. Fixes llvm#164853

There was a bug in llvm-lit related to setting PATH using env in the internal shell. The new PATH wasn't used when looking up the command to be executed. So when doing things like this in a test case RUN: mkdir %t RUN: env PATH=%t program ... the internal shell would search for "program" using the orignal PATH and not the PATH set by env when preceeding the command. It seems like this was a simple mistake in commit 57782ef, since the logic to pick a PATH from the cmd_shenv instead of shenv actually was added in that patch, but the resulting path wasn't used.

…system stdlib (llvm#164462) On linux if you specify the an external libc++ and clang will still link to the system's libc++. This patch fixes that. Fixes llvm#116040

…65391) This reverts commit 2f869c4. Breaks build on some configurations

AbstractCallSite handles three types of calls (direct, indirect, and callback). This patch fixes the handling of indirect calls in some methods, which incorrectly assumed that non-direct calls are always callback calls. Moreover, this PR adds 2 unit tests for direct call type and indirect call type. The aforementioned misassumption leads to the following problem: --- ## Problem When the underlying call is **indirect**, some APIs of `AbstractCallSite` behave unexpectedly. E.g., `AbstractCallSite::getCalledFunction()` currently triggers an **assertion failure**, instead of returning `nullptr` as documented: ```cpp /// Return the function being called if this is a direct call, otherwise /// return null (if it's an indirect call). Function *getCalledFunction() const; ``` Actual unexpected assertion failure: ``` AbstractCallSite.h:197: int llvm::AbstractCallSite::getCallArgOperandNoForCallee() const: Assertion `isCallbackCall()' failed. ``` This is because `AbstractCallSite` mistakenly entered the branch that handles Callback Calls as its guard condition (`!isDirectCall()`) does not take into account the case of indirect calls

The test added in llvm#161067 writes artifacts to the current dir, i.e. `test.o` / `test.dwo` / `test.dwp`, which might not be writeable. Tests should use `%t` for test artifact location, i.e. `%t.o` / `%t.dwo` / `%t.dwp` However, since `"test.dwo"` is part of the assembly source file used as a test input, and that's not something lit will substitute, that typical approach doesn't work. We can instead ensure the output is in a good location by running `cd %t` (after setting it up).

LLVM prints switch cases indented by 2 additional spaces, as follows: ```LLVM switch i32 %x, label %default [ i32 0, label %phi i32 1, label %phi ] ``` Since this only changes the output IR of update_test_checks.py and does not change the logic of the File Check Pattern, there seems to be no need to update the existing test cases.

@pearu

This ports openxla/stablehlo#2682 implementation by @pearu. Three tests were added to `Integration/Dialect/Complex/CPU/correctness.mlir`. I also verified accuracy using XLA's complex_unary_op_test and its MLIR emitters.

1. createHvxPrefixPred was computing an invalid byte count for small predicate types, leading to a crash during instruction selection. 2. HexagonTargetLowering::SplitHvxMemOp assumed the memory vector type is always simple. This patch adds a guard to avoid processing non-simple vector types, which can lead to failure. Patch By: Fateme Hosseini Co-authored-by: pavani karveti <[email protected]> Co-authored-by: Sergei Larin <[email protected]> Co-authored-by: Pavani Karveti <[email protected]>

) Part of llvm#102817. This patch optimizes `rng::generate_n` for segmented iterators by forwarding the implementation directly to `std::generate_n`. - before ``` rng::generate_n(deque<int>)/32 21.7 ns 22.0 ns 32000000 rng::generate_n(deque<int>)/50 30.8 ns 30.7 ns 22400000 rng::generate_n(deque<int>)/1024 492 ns 488 ns 1120000 rng::generate_n(deque<int>)/8192 3938 ns 3924 ns 179200 ``` - after ``` rng::generate_n(deque<int>)/32 11.0 ns 11.0 ns 64000000 rng::generate_n(deque<int>)/50 16.2 ns 16.1 ns 40727273 rng::generate_n(deque<int>)/1024 292 ns 286 ns 2240000 rng::generate_n(deque<int>)/8192 2291 ns 2302 ns 298667 ```

…r switch lowering (llvm#155910) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) || (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <[email protected]>

…llvm#165371) We may need to load ZT0 after the call, so we can't perform a tail call.

This test fails on some arm64 macOS runs currently. This patch bumps up the number of runs by 10x to hopefully get it passing consistently. rdar://162122184

This test is now XPASSing due to a linker update on the platform. This patch removes the XFAIL from the test. rdar://163149345

…m#165417) Skip the test for Windows hosts. This patch fixes the buildbot `lldb-remote-linux-win`. https://lab.llvm.org/buildbot/#/builders/197/builds/10304

…token() (llvm#156842) Implement code generation for `__builtin_infer_alloc_token()`. The `AllocToken` pass is now registered to run unconditionally in the optimization pipeline. This ensures that all instances of the `llvm.alloc.token.id` intrinsic are lowered to constant token IDs, regardless of whether `-fsanitize=alloc-token` is enabled. This guarantees that the builtin always resolves to a token value, providing a consistent and reliable mechanism for compile-time token querying. This completes `__builtin_infer_alloc_token(<malloc-args>, ...)` to allow compile-time querying of the token ID, where the builtin arguments mirror those normally passed to any allocation function. The argument expressions are unevaluated operands. For type-based token modes, the same type inference logic is used as for untyped allocation calls. For example the ID that is passed to (with `-fsanitize=alloc-token`): some_malloc(sizeof(Type), ...) is equivalent to the token ID returned by __builtin_infer_alloc_token(sizeof(Type), ...) The builtin provides a mechanism to pass or compare token IDs in code that needs to be explicitly allocation token-aware (such as inside an allocator, or through wrapper macros). A more concrete demonstration of __builtin_infer_alloc_token's use is enabling type-aware Slab allocations in the Linux kernel: https://lore.kernel.org/all/[email protected]/ Notably, any kind of allocation-call rewriting is a poor fit for the Linux kernel's kmalloc-family functions, which are macros that wrap (multiple) layers of inline and non-inline wrapper functions. Given the Linux kernel defines its own allocation APIs, the more explicit builtin gives the right level of control over where the type inference happens and the resulting token is passed.

…ectors.

…valuation (llvm#164026) Enables constexpr evaluation for the following AVX512 Integer Comparison Intrinsics: ``` _mm_cmp_epi8_mask _mm_cmp_epu8_mask _mm_cmp_epi16_mask _mm_cmp_epu16_mask _mm_cmp_epi32_mask _mm_cmp_epu32_mask _mm_cmp_epi64_mask _mm_cmp_epu64_mask _mm256_cmp_epi8_mask _mm256_cmp_epu8_mask _mm256_cmp_epi16_mask _mm256_cmp_epu16_mask _mm256_cmp_epi32_mask _mm256_cmp_epu32_mask _mm256_cmp_epi64_mask _mm256_cmp_epu64_mask _mm512_cmp_epi8_mask _mm512_cmp_epu8_mask _mm512_cmp_epi16_mask _mm512_cmp_epu16_mask _mm512_cmp_epi32_mask _mm512_cmp_epu32_mask _mm512_cmp_epi64_mask _mm512_cmp_epu64_mask ``` Part 1 of llvm#162054

Upstream try block with only noexcept calls inside, which doesn't need to be converted to TryCallOp Issue llvm#154992

…64955)

Update `amdgpu.wmma` op definition and implement amdgpu to rocdl conversion for new variants.

This is still somehow a WIP, we have some issues with this interface that are not trivial to solve. This patch tries to make the concepts of RegionBranchPoint and RegionSuccessor more robust and aligned with their definition: - A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`) op or a `RegionBranchTerminatorOpInterface` operation in a nested region. - A `RegionSuccessor` is either one of the nested region or the parent `RegionBranchOpInterface` Some new methods with reasonnable default implementation are added to help resolving the flow of values across the RegionBranchOpInterface. It is still not trivial in the current state to walk the def-use chain backward with this interface. For example when you have the 3rd block argument in the entry block of a for-loop, finding the matching operands requires to know about the hidden loop iterator block argument and where the iterargs start. The API is designed around forward-tracking of the chain unfortunately. Try to reland llvm#161575 ; I suspect a buildbot incremental build issue.

Add new instruction `mtlpl`.

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations. This is a stacked PR, 1a and 1b can be merged in any order: 1a. llvm#147302 1b. llvm#163175 2. -> llvm#162503

…), amt)) -> (load p + amt/8) fold (llvm#165436) The pointer adjustment no longer guarantees any alignment Missed in llvm#165266 and only noticed in some follow up work

With Xqcili, `c.li` may be relaxed to `qc.e.li` (this is because `qc.e.li` is compressed into `c.li`, which needs to be undone). `qc.e.li` is relaxable, so we need to mark `c.li` as linker relaxable when it is emitted. This fixup cannot be emitted as a relocation, but we still mark it as requiring no R_RISCV_RELAX in case this changes in the future.

llvm#165316) As described in the programming guide: https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations

Consider OpenMP stylized expression to be a template to be instantiated with a series of types listed on the containing directive (currently DECLARE_REDUCTION). Create a series of instantiations in the parser, allowing OpenMP special variables to be declared separately for each type. --------- Co-authored-by: Tom Eccles <[email protected]>

Adds the WaveActiveMin intrinsic from llvm#99169. I think I did all of the required things on the checklist: - [x] Implement `WaveActiveMin` clang builtin, - [x] Link `WaveActiveMin` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WaveActiveMin` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WaveActiveMin` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WaveActiveMin.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WaveActiveMin-errors.hlsl` - [x] Create the `int_dx_WaveActiveMin` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WaveActiveMin` to `119` in `DXIL.td` - [x] Create the `WaveActiveMin.ll` and `WaveActiveMin_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WaveActiveMin` intrinsic in `IntrinsicsSPIRV.td` - [x] In SPIRVInstructionSelector.cpp create the `WaveActiveMin` lowering and map it to `int_spv_WaveActiveMin` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveMin.ll But as some of the code has changed and was moved around (E.G. `CGBuiltin.cpp` -> `CGHLSLBuiltins.cpp`) I mostly followed how `WaveActiveMax()` is implemented. I have not been able to run the tests myself as I am unsure which project runs the correct test. Any guidance on how I can test myself would be helpful. Also added some tests to the offload-test-suite llvm/offload-test-suite#478

Need to re-check the instruction with the non-schedulable parent, only if this parent has a user phi node (i.e. it is used only outside the block) and the user instruction has unique parent instruction. Fixes issue reported in llvm@20675ee#commitcomment-168863594

…#165441) Fix building ClangIR after RegionBranchOpInterface revamp (llvm#165429)

In 9865171, a file named aarch64-mlr-for-calls-only.c was added to clang/include/clang/Driver. This file contains only llvm-lit directives. The file has been moved to clang/test/Driver where it ought to reside.

z1-cciauto · 2025-10-28T19:09:09Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2487

bgergely0 and others added 30 commits October 28, 2025 12:43

[BOLT] Fix thread-safety of MarkRAStates (llvm#165368)

e12e0d3

The pass calls setIgnored() on functions in parallel, but setIgnored is not thread safe. This patch adds a std::mutex to guard setIgnored calls. Fixes: llvm#165362

[MLIR] Fix test after ptrtoaddr change

e80dc27

b6bbc4b fixed IRBuilder::CreatePtrToAddr to produce the correct instruction. Update the test for ptr_diff lowering accordingly.

[InstCombine] Support ptrtoaddr of gep fold

1d7d26c

This fold can be directly reused for ptrtoaddr. One caveat is that for an inttoptr base, it currently won't work for pointers with non-address bits. It's possible to support this case.

[lldb][test] When an external stdlib is specified do not link to the …

ec55aa4

…system stdlib (llvm#164462) On linux if you specify the an external libc++ and clang will still link to the system's libc++. This patch fixes that. Fixes llvm#116040

Revert "[nsan] More unit tests for float128. (llvm#165248)" (llvm#1…

cc22c9c

…65391) This reverts commit 2f869c4. Breaks build on some configurations

[NFC][Clang] Regenerate CHECKs - CodeGen/AArch64/neon-across.c

f162488

[mlir][complex] Fix exp accuracy (llvm#164952)

d30bd27

This ports openxla/stablehlo#2682 implementation by @pearu. Three tests were added to `Integration/Dialect/Complex/CPU/correctness.mlir`. I also verified accuracy using XLA's complex_unary_op_test and its MLIR emitters.

Add switch_case.test to profcheck-xfail.txt (llvm#165407)

a4950c4

[AArch64][SME] Disable tail calls for callees that require saving ZT0 (…

bfb54e8

…llvm#165371) We may need to load ZT0 after the call, so we can't perform a tail call.

[Fuzzer][Test-Only] Increase runs for reduce-inputs.test (llvm#165402)

2aea02d

This test fails on some arm64 macOS runs currently. This patch bumps up the number of runs by 10x to hopefully get it passing consistently. rdar://162122184

[Fuzzer][Test-Only] Re-enable fuzzer-ubsan.test on Darwin (llvm#165403)

3172970

This test is now XPASSing due to a linker update on the platform. This patch removes the XFAIL from the test. rdar://163149345

DAG: Consider __sincos_stret when deciding to form fsincos (llvm#165169)

28e9a28

[lldb] The test added for PR#164905 doesn't run on Windows host. (llv…

624d4f6

…m#165417) Skip the test for Windows hosts. This patch fixes the buildbot `lldb-remote-linux-win`. https://lab.llvm.org/buildbot/#/builders/197/builds/10304

Extend vector reduction constants folding tests to include scalable v…

16f61ac

…ectors.

[CIR] Upstream Try block with only noexcept calls (llvm#165153)

7164544

Upstream try block with only noexcept calls inside, which doesn't need to be converted to TryCallOp Issue llvm#154992

[MemRef] Implement value bounds interface for CollapseShapeOp (llvm#1…

6ad9565

…64955)

[mlir][amdgpu][rocdl] Add gfx1250 wmma ops (llvm#165064)

466c526

Update `amdgpu.wmma` op definition and implement amdgpu to rocdl conversion for new variants.

joker-eph and others added 15 commits October 28, 2025 09:53

[PowerPC] Implement Context Switch Instr mtlpl (llvm#160593)

e5668d3

Add new instruction `mtlpl`.

[MLIR] Fix some typos in AffineOps.td (NFC)

87f9e1b

[X86] combineTruncate - drop load alignment after (trunc (srl (load p…

af110e1

…), amt)) -> (load p + amt/8) fold (llvm#165436) The pointer adjustment no longer guarantees any alignment Missed in llvm#165266 and only noticed in some follow up work

[bazel][mlir] Port llvm#165429: RegionBranchOpInterface (llvm#165447)

8895386

[flang][cuda] Add interfaces and lowering for barrier_try_wait(_sleep) (

56c1d35

llvm#165316) As described in the programming guide: https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations

[CIR] Fix building ClangIR after RegionBranchOpInterface revamp (llvm…

d0e0d7f

…#165441) Fix building ClangIR after RegionBranchOpInterface revamp (llvm#165429)

[CI] fix typo in code-format job (llvm#165461)

a449c34

[clang][Driver] Move test out of clang/include

a00bb9c

In 9865171, a file named aarch64-mlr-for-calls-only.c was added to clang/include/clang/Driver. This file contains only llvm-lit directives. The file has been moved to clang/test/Driver where it ought to reside.

merge main into amd-staging

5183577

z1-cciauto requested review from fabianmcg, krzysz00, kuhar and nicolasvasilache as code owners October 28, 2025 19:07

z1-cciauto requested a review from a team October 28, 2025 19:07

ronlieb approved these changes Oct 28, 2025

View reviewed changes

ronlieb merged commit 021b7c9 into amd-staging Oct 28, 2025
15 checks passed

ronlieb deleted the upstream_merge_202510281507 branch October 28, 2025 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #417

merge main into amd-staging #417

Uh oh!

z1-cciauto commented Oct 28, 2025

Uh oh!

z1-cciauto commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

38 participants

merge main into amd-staging #417

merge main into amd-staging #417

Uh oh!

Conversation

z1-cciauto commented Oct 28, 2025

Uh oh!

z1-cciauto commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

38 participants