diff --git a/clang/docs/OpenMPSupport.rst b/clang/docs/OpenMPSupport.rst
index 61b5babbd18a8..10a8d095fede3 100644
--- a/clang/docs/OpenMPSupport.rst
+++ b/clang/docs/OpenMPSupport.rst
@@ -1,662 +1,662 @@
-.. raw:: html
-
-  <style type="text/css">
-    .none { background-color: #FFCCCC }
-    .part { background-color: #FFFF99 }
-    .good { background-color: #CCFF99 }
-  </style>
-
-.. role:: none
-.. role:: part
-.. role:: good
-
-.. contents::
-   :local:
-
-==============
-OpenMP Support
-==============
-
-Clang fully supports OpenMP 4.5, almost all of 5.0 and most of 5.1/2.
-Clang supports offloading to X86_64, AArch64, PPC64[LE], NVIDIA GPUs (all models) and AMD GPUs (all models).
-
-In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
-Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.
-OMPT is also supported for NVIDIA and AMD GPUs.
-
-For the list of supported features from OpenMP 5.0 and 5.1
-see `OpenMP implementation details`_ and `OpenMP 51 implementation details`_.
-
-General improvements
-====================
-- New collapse clause scheme to avoid expensive remainder operations.
-  Compute loop index variables after collapsing a loop nest via the
-  collapse clause by replacing the expensive remainder operation with
-  multiplications and additions.
-
-- When using the collapse clause on a loop nest the default behavior
-  is to automatically extend the representation of the loop counter to
-  64 bits for the cases where the sizes of the collapsed loops are not
-  known at compile time. To prevent this conservative choice and use
-  at most 32 bits, compile your program with the
-  `-fopenmp-optimistic-collapse`.
-
-
-GPU devices support
-===================
-
-Data-sharing modes
-------------------
-
-Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda`
-modes. The default mode is `Generic`. `Cuda` mode can give an additional
-performance and can be activated using the `-fopenmp-cuda-mode` flag. In
-`Generic` mode all local variables that can be shared in the parallel regions
-are stored in the global memory. In `Cuda` mode local variables are not shared
-between the threads and it is user responsibility to share the required data
-between the threads in the parallel regions. Often, the optimizer is able to
-reduce the cost of `Generic` mode to the level of `Cuda` mode, but the flag,
-as well as other assumption flags, can be used for tuning.
-
-Features not supported or with limited support for Cuda devices
----------------------------------------------------------------
-
-- Cancellation constructs are not supported.
-
-- Doacross loop nest is not supported.
-
-- User-defined reductions are supported only for trivial types.
-
-- Nested parallelism: inner parallel regions are executed sequentially.
-
-- Debug information for OpenMP target regions is supported, but sometimes it may
-  be required to manually specify the address class of the inspected variables.
-  In some cases the local variables are actually allocated in the global memory,
-  but the debug info may be not aware of it.
-
-
-.. _OpenMP implementation details:
-
-OpenMP 5.0 Implementation Details
-=================================
-
-The following table provides a quick overview over various OpenMP 5.0 features
-and their implementation status. Please post on the
-`Discourse forums (Runtimes - OpenMP category)`_ for more
-information or if you want to help with the
-implementation.
-
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-|Category                      | Feature                                                      | Status                   | Reviews                                                               |
-+==============================+==============================================================+==========================+=======================================================================+
-| loop                         | support != in the canonical loop form                        | :good:`done`             | D54441                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | #pragma omp loop (directive)                                 | :part:`partial`          | D145823 (combined forms)                                              |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | #pragma omp loop bind                                        | :part:`worked on`        | D144634 (needs review)                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | collapse imperfectly nested loop                             | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | collapse non-rectangular nested loop                         | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | C++ range-base for loop                                      | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | clause: if for SIMD directives                               | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | inclusive scan (matching C++17 PSTL)                         | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory management            | memory allocators                                            | :good:`done`             | r341687,r357929                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory management            | allocate directive and allocate clause                       | :good:`done`             | r355614,r335952                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| OMPD                         | OMPD interfaces                                              | :good:`done`             | https://reviews.llvm.org/D99914   (Supports only HOST(CPU) and Linux  |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| OMPT                         | OMPT interfaces (callback support)                           | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| thread affinity              | thread affinity                                              | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | taskloop reduction                                           | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | task affinity                                                | :part:`not upstream`     | https://github.com/jklinkenberg/openmp/tree/task-affinity             |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | clause: depend on the taskwait construct                     | :good:`done`             | D113540 (regular codegen only)                                        |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | depend objects and detachable tasks                          | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | mutexinoutset dependence-type for tasks                      | :good:`done`             | D53380,D57576                                                         |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | combined taskloop constructs                                 | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | master taskloop                                              | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | parallel master taskloop                                     | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | master taskloop simd                                         | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | parallel master taskloop simd                                | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| SIMD                         | atomic and simd constructs inside SIMD code                  | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| SIMD                         | SIMD nontemporal                                             | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | infer target functions from initializers                     | :part:`worked on`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | infer target variables from initializers                     | :good:`done`             | D146418                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | OMP_TARGET_OFFLOAD environment variable                      | :good:`done`             | D50522                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | support full 'defaultmap' functionality                      | :good:`done`             | D69204                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | device specific functions                                    | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: device_type                                          | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: extended device                                      | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: uses_allocators clause                               | :good:`done`             | https://github.com/llvm/llvm-project/pull/157025                      |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: in_reduction                                         | :part:`worked on`        | r308768                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | omp_get_device_num()                                         | :good:`done`             | D54342,D128347                                                        |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | structure mapping of references                              | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | nested target declare                                        | :good:`done`             | D51378                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | implicitly map 'this' (this[:1])                             | :good:`done`             | D55982                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | allow access to the reference count (omp_target_is_present)  | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | requires directive                                           | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: unified_shared_memory                                | :good:`done`             | D52625,D52359                                                         |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: unified_address                                      | :part:`partial`          |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: reverse_offload                                      | :part:`partial`          | D52780,D155003                                                        |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: atomic_default_mem_order                             | :good:`done`             | D53513                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: dynamic_allocators                                   | :part:`unclaimed parts`  | D53079                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | user-defined mappers                                         | :good:`done`             | D56326,D58638,D58523,D58074,D60972,D59474                             |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | map array-section with implicit mapper                       | :good:`done`             |  https://github.com/llvm/llvm-project/pull/101101                     |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | mapping lambda expression                                    | :good:`done`             | D51107                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | clause: use_device_addr for target data                      | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | support close modifier on map clause                         | :good:`done`             | D55719,D55892                                                         |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | teams construct on the host device                           | :good:`done`             | r371553                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | support non-contiguous array sections for target update      | :good:`done`             | https://github.com/llvm/llvm-project/pull/144635                      |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | pointer attachment                                           | :part:`being repaired`   | @abhinavgaba (https://github.com/llvm/llvm-project/pull/153683)       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| atomic                       | hints for the atomic construct                               | :good:`done`             | D51233                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| base language                | C11 support                                                  | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| base language                | C++11/14/17 support                                          | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| base language                | lambda support                                               | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | array shaping                                                | :good:`done`             | D74144                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | library shutdown (omp_pause_resource[_all])                  | :good:`done`             | D55078                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | metadirectives                                               | :part:`mostly done`      | D91944, https://github.com/llvm/llvm-project/pull/128640              |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | conditional modifier for lastprivate clause                  | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | iterator and multidependences                                | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | depobj directive and depobj dependency kind                  | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | user-defined function variants                               | :good:`done`.            | D67294, D64095, D71847, D71830, D109635                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | pointer/reference to pointer based array reductions          | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | prevent new type definitions in clauses                      | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory model                 | memory model update (seq_cst, acq_rel, release, acquire,...) | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-
-
-.. _OpenMP 51 implementation details:
-
-OpenMP 5.1 Implementation Details
-=================================
-
-The following table provides a quick overview over various OpenMP 5.1 features
-and their implementation status.
-Please post on the
-`Discourse forums (Runtimes - OpenMP category)`_ for more
-information or if you want to help with the
-implementation.
-
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-|Category                      | Feature                                                      | Status                   | Reviews                                                               |
-+==============================+==============================================================+==========================+=======================================================================+
-| atomic                       | 'compare' clause on atomic construct                         | :good:`done`             | D120290, D120007, D118632, D120200, D116261, D118547, D116637         |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| atomic                       | 'fail' clause on atomic construct                            | :part:`worked on`        | D123235 (in progress)                                                 |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| base language                | C++ attribute specifier syntax                               | :good:`done`             | D105648                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | 'present' map type modifier                                  | :good:`done`             | D83061, D83062, D84422                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | 'present' motion modifier                                    | :good:`done`             | D84711, D84712                                                        |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | 'present' in defaultmap clause                               | :good:`done`             | D92427                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | map clause reordering based on 'present' modifier            | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | device-specific environment variables                        | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | omp_target_is_accessible routine                             | :good:`done`             | https://github.com/llvm/llvm-project/pull/138294                      |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | omp_get_mapped_ptr routine                                   | :good:`done`             | D141545                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | new async target memory copy routines                        | :good:`done`             | D136103                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | thread_limit clause on target construct                      | :part:`partial`          | D141540 (offload), D152054 (host, in progress)                        |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | has_device_addr clause on target construct                   | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | iterators in map clause or motion clauses                    | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | indirect clause on declare target directive                  | :part:`In Progress`      |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | allow virtual functions calls for mapped object on device    | :part:`partial`          |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | interop construct                                            | :part:`partial`          | parsing/sema done: D98558, D98834, D98815                             |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| device                       | assorted routines for querying interoperable properties      | :part:`partial`          | D106674                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | Loop tiling transformation                                   | :good:`done`             | D76342                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | Loop unrolling transformation                                | :good:`done`             | D99459                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| loop                         | 'reproducible'/'unconstrained' modifiers in 'order' clause   | :part:`partial`          | D127855                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory management            | alignment for allocate directive and clause                  | :good:`done`             | D115683                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory management            | 'allocator' modifier for allocate clause                     | :good:`done`             | https://github.com/llvm/llvm-project/pull/114883                      |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory management            | 'align' modifier for allocate clause                         | :good:`done`             | https://github.com/llvm/llvm-project/pull/121814                      |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory management            | new memory management routines                               | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory management            | changes to omp_alloctrait_key enum                           | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| memory model                 | seq_cst clause on flush construct                            | :good:`done`             | https://github.com/llvm/llvm-project/pull/114072                      |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | 'omp_all_memory' keyword and use in 'depend' clause          | :good:`done`             | D125828, D126321                                                      |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | error directive                                              | :good:`done`             | D139166                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | scope construct                                              | :good:`done`             | D157933, https://github.com/llvm/llvm-project/pull/109197             |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | routines for controlling and querying team regions           | :part:`partial`          | D95003 (libomp only)                                                  |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | changes to ompt_scope_endpoint_t enum                        | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | omp_display_env routine                                      | :good:`done`             | D74956                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | extended OMP_PLACES syntax                                   | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars            | :good:`done`             | D138769                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | 'target_device' selector in context specifier                | :none:`worked on`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | begin/end declare variant                                    | :good:`done`             | D71179                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | dispatch construct and function variant argument adjustment  | :part:`worked on`        | D99537, D99679                                                        |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | assumes directives                                           | :part:`worked on`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | assume directive                                             | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | nothing directive                                            | :good:`done`             | D123286                                                               |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | masked construct and related combined constructs             | :good:`done`             | D99995, D100514, PR-121741(parallel_masked_taskloop)                  |
-|                              |                                                              |                          | PR-121746(parallel_masked_task_loop_simd),PR-121914(masked_taskloop)  |
-|                              |                                                              |                          | PR-121916(masked_taskloop_simd)                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| misc                         | default(firstprivate) & default(private)                     | :good:`done`             | D75591 (firstprivate), D125912 (private)                              |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| other                        | deprecating master construct                                 | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| OMPT                         | new barrier types added to ompt_sync_region_t enum           | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| OMPT                         | async data transfers added to ompt_target_data_op_t enum     | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| OMPT                         | new barrier state values added to ompt_state_t enum          | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| OMPT                         | new 'emi' callbacks for external monitoring interfaces       | :good:`done`             |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| OMPT                         | device tracing interface                                     | :none:`in progress`      | jplehr                                                                |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | 'strict' modifier for taskloop construct                     | :none:`unclaimed`        |                                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | inoutset in depend clause                                    | :good:`done`             | D97085, D118383                                                       |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-| task                         | nowait clause on taskwait                                    | :part:`partial`          | parsing/sema done: D131830, D141531                                   |
-+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
-
-
-.. _OpenMP 5.2 implementation details:
-
-OpenMP 5.2 Implementation Details
-=================================
-
-The following table provides a quick overview of various OpenMP 5.2 features
-and their implementation status. Please post on the
-`Discourse forums (Runtimes - OpenMP category)`_ for more
-information or if you want to help with the
-implementation.
-
-
-
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-|Feature                                                      | C/C++ Status              |  Fortran Status           | Reviews                                                                  |
-+=============================================================+===========================+===========================+==========================================================================+
-| omp_in_explicit_task()                                      | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| semantics of explicit_task_var and implicit_task_var        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| ompx sentinel for C/C++ directive extensions                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| ompx prefix for clause extensions                           | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| if clause on teams construct                                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| step modifier added                                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| declare mapper: Add iterator modifier on map clause         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| declare mapper: Add iterator modifier on map clause         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| memspace and traits modifiers to uses allocator         i   | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Add otherwise clause to metadirectives                      | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| doacross clause with support for omp_cur_iteration          | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| position of interop_type in init clause on iterop           | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| implicit map type for target enter/exit data                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| work OMPT type for work-sharing loop constructs             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| allocate and firstprivate on scope directive                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Change loop consistency for order clause                    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Add memspace and traits modifiers to uses_allocators        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Keep original base pointer on map w/o matched candidate     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Pure procedure support for certain directives               | :none:`N/A`               | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| ALLOCATE statement support for allocators                   | :none:`N/A`               | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| dispatch construct extension to support end directive       | :none:`N/A`               | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-
-
-
-.. _OpenMP 5.2 Deprecations:
-
-OpenMP 5.2 Deprecations
-=======================
-
-
-
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-|                                                             | C/C++ Status              |  Fortran Status           | Reviews                                                                  |
-+=============================================================+===========================+===========================+==========================================================================+
-| Linear clause syntax                                        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| The minus operator                                          | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Map clause modifiers without commas                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| The use of allocate directives with ALLOCATE statement      | :good:`N/A`               | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| uses_allocators list syntax                                 | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| The default clause on metadirectives                        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| The delimited form of the declare target directive          | :none:`unclaimed`         | :good:`N/A`               |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| The use of the to clause on the declare target directive    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| The syntax of the destroy clause on the depobj construct    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| keyword source and sink as task-dependence modifiers        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| interop types in any position on init clause of interop     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| ompd prefix usage for some ICVs                             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-
-.. _OpenMP 6.0 implementation details:
-
-OpenMP 6.0 Implementation Details
-=================================
-
-The following table provides a quick overview of various OpenMP 6.0 features
-and their implementation status. Please post on the
-`Discourse forums (Runtimes - OpenMP category)`_ for more
-information or if you want to help with the
-implementation.
-
-
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-|Feature                                                      | C/C++ Status              |  Fortran Status           | Reviews                                                                  |
-+=============================================================+===========================+===========================+==========================================================================+
-| free-agent threads                                          | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| threadset clause                                            | :part:`in progress`       | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Recording of task graphs                                    | :part:`in progress`       | :part:`in progress`       | clang: jtb20, flang: kparzysz                                            |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Parallel inductions                                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| init_complete for scan directive                            | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop interchange transformation                             | :good:`done`              | :none:`unclaimed`         | Clang (interchange): https://github.com/llvm/llvm-project/pull/93022     |
-|                                                             |                           |                           | Clang (permutation): https://github.com/llvm/llvm-project/pull/92030     |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop reverse transformation                                 | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/92916                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop stripe transformation                                  | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/119891                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop fusion transformation                                  | :part:`in progress`       | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/139293                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop index set splitting transformation                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop transformation apply clause                            | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop fuse transformation                                    | :good:`done`              | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| workdistribute construct                                    |                           | :none:`in progress`       | @skc7, @mjklemm                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| task_iteration                                              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| memscope clause for atomic and flush                        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| transparent clause (hull tasks)                             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| rule-based compound directives                              | :part:`In Progress`       | :part:`In Progress`       | kparzysz                                                                 |
-|                                                             |                           |                           | Testing for Fortran missing                                              |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| C23, C++23                                                  | :none:`unclaimed`         |                           |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Fortran 2023                                                |                           | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| decl attribute for declarative directives                   | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| C attribute syntax                                          | :none:`unclaimed`         |                           |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| pure directives in DO CONCURRENT                            |                           | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Optional argument for all clauses                           | :none:`partial`           | :none:`In Progress`       | Parse/Sema (nowait): https://github.com/llvm/llvm-project/pull/159628    |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Function references for locator list items                  | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| All clauses accept directive name modifier                  | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Extensions to depobj construct                              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Extensions to atomic construct                              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Private reductions                                          | :good:`mostly`            | :none:`unclaimed`         | Parse/Sema:https://github.com/llvm/llvm-project/pull/129938              |
-|                                                             |                           |                           | Codegen: https://github.com/llvm/llvm-project/pull/134709                |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Self maps                                                   | :part:`partial`           | :none:`unclaimed`         | parsing/sema done: https://github.com/llvm/llvm-project/pull/129888      |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Release map type for declare mapper                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Extensions to interop construct                             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| no_openmp_constructs                                        | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125933                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| safe_sync and progress with identifier and API              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| OpenMP directives in concurrent loop regions                | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125621                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| atomics constructs on concurrent loop regions               | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125621                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Loop construct with DO CONCURRENT                           |                           | :part:`In Progress`       |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| device_type clause for target construct                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| nowait for ancestor target directives                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| New API for devices' num_teams/thread_limit                 | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Host and device environment variables                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| num_threads ICV and clause accepts list                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Numeric names for environment variables                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Increment between places for OMP_PLACES                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| OMP_AVAILABLE_DEVICES envirable                             | :none:`unclaimed`         | :none:`unclaimed`         | (should wait for "Traits for default device envirable" being done)       |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Traits for default device envirable                         | :part:`in progress`       | :none:`unclaimed`         | ro-i                                                                     |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Optionally omit array length expression                     | :good:`done`              | :none:`unclaimed`         | (Parse) https://github.com/llvm/llvm-project/pull/148048,                |
-|                                                             |                           |                           | (Sema) https://github.com/llvm/llvm-project/pull/152786                  |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Canonical loop sequences                                    | :part:`in progress`       | :part:`in progress`       | Clang: https://github.com/llvm/llvm-project/pull/139293                  |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Clarifications to Fortran map semantics                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| default clause at target construct                          | :part:`In Progress`       | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| ref count update use_device_{ptr, addr}                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Clarifications to implicit reductions                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| ref modifier for map clauses                                | :part:`In Progress`       | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| map-type modifiers in arbitrary position                    | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/90499                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Lift nesting restriction on concurrent loop                 | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125621                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| priority clause for target constructs                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| changes to target_data construct                            | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Non-const do_not_sync for nowait/nogroup                    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| need_device_addr modifier for adjust_args clause            | :part:`partial`           | :none:`unclaimed`         | Parsing/Sema: https://github.com/llvm/llvm-project/pull/143442           |
-|                                                             |                           |                           |               https://github.com/llvm/llvm-project/pull/149586           |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Prescriptive num_threads                                    | :good:`done`              | :none:`unclaimed`         |  https://github.com/llvm/llvm-project/pull/160659                        |
-|                                                             |                           |                           |  https://github.com/llvm/llvm-project/pull/146403                        |
-|                                                             |                           |                           |  https://github.com/llvm/llvm-project/pull/146404                        |
-|                                                             |                           |                           |  https://github.com/llvm/llvm-project/pull/146405                        |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Message and severity clauses                                | :good:`done`              | :none:`unclaimed`         |  https://github.com/llvm/llvm-project/pull/146093                        |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Local clause on declare target                              | :part:`In Progress`       | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| groupprivate directive                                      | :part:`In Progress`       | :part:`partial`           | Flang: kparzysz, mjklemm                                                 |
-|                                                             |                           |                           |                                                                          |
-|                                                             |                           |                           | Flang parser: https://github.com/llvm/llvm-project/pull/153807           |
-|                                                             |                           |                           | Flang sema: https://github.com/llvm/llvm-project/pull/154779             |
-|                                                             |                           |                           | Clang parse/sema: https://github.com/llvm/llvm-project/pull/158134       |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| variable-category on default clause                         | :good:`done`              | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| Changes to omp_target_is_accessible                         | :part:`In Progress`       | :part:`In Progress`       |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| defaultmap implicit-behavior 'storage'                      | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/158336                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| defaultmap implicit-behavior 'private'                      | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/158712                         |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-
-.. _OpenMP 6.1 implementation details:
-
-OpenMP 6.1 Implementation Details (Experimental)
-================================================
-
-The following table provides a quick overview over various OpenMP 6.1 features
-and their implementation status. Since OpenMP 6.1 has not yet been released, the
-following features are experimental and are subject to change at any time.
-Please post on the `Discourse forums (Runtimes - OpenMP category)`_ for more
-information or if you want to help with the
-implementation.
-
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-|Feature                                                      | C/C++ Status              | Fortran Status            | Reviews                                                                  |
-+=============================================================+===========================+===========================+==========================================================================+
-| dyn_groupprivate clause                                     | :part:`In Progress`       | :part:`In Progress`       | C/C++: kevinsala (https://github.com/llvm/llvm-project/pull/152651       |
-|                                                             |                           |                           | https://github.com/llvm/llvm-project/pull/152830                         |
-|                                                             |                           |                           | https://github.com/llvm/llvm-project/pull/152831)                        |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop flatten transformation                                 | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| loop grid/tile modifiers for sizes clause                   | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-| attach map-type modifier                                    | :part:`In Progress`       | :none:`unclaimed`         | C/C++: @abhinavgaba;                                                     |
-|                                                             |                           |                           | RT: @abhinavgaba (https://github.com/llvm/llvm-project/pull/149036,      |
-|                                                             |                           |                           | https://github.com/llvm/llvm-project/pull/158370)                        |
-+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
-
-
-OpenMP Extensions
-=================
-
-The following table provides a quick overview over various OpenMP
-extensions and their implementation status.  These extensions are not
-currently defined by any standard, so links to associated LLVM
-documentation are provided.  As these extensions mature, they will be
-considered for standardization. Please post on the
-`Discourse forums (Runtimes - OpenMP category)`_ to provide feedback.
-
-+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
-|Category                      | Feature                                                                           | Status                   | Reviews                                                |
-+==============================+===================================================================================+==========================+========================================================+
-| atomic extension             | `'atomic' strictly nested within 'teams'                                          | :good:`prototyped`       | D126323                                                |
-|                              | <https://openmp.llvm.org/docs/openacc/OpenMPExtensions.html#atomicWithinTeams>`_  |                          |                                                        |
-+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
-| device extension             | `'ompx_hold' map type modifier                                                    | :good:`prototyped`       | D106509, D106510                                       |
-|                              | <https://openmp.llvm.org/docs/openacc/OpenMPExtensions.html#ompx-hold>`_          |                          |                                                        |
-+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
-| device extension             | `'ompx_bare' clause on 'target teams' construct                                   | :good:`prototyped`       | #66844, #70612                                         |
-|                              | <https://www.osti.gov/servlets/purl/2205717>`_                                    |                          |                                                        |
-+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
-| device extension             | Multi-dim 'num_teams' and 'thread_limit' clause on 'target teams ompx_bare'       | :good:`partial`          | #99732, #101407, #102715                               |
-|                              | construct                                                                         |                          |                                                        |
-+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
-
-.. _Discourse forums (Runtimes - OpenMP category): https://discourse.llvm.org/c/runtimes/openmp/35
+.. raw:: html
+
+  <style type="text/css">
+    .none { background-color: #FFCCCC }
+    .part { background-color: #FFFF99 }
+    .good { background-color: #CCFF99 }
+  </style>
+
+.. role:: none
+.. role:: part
+.. role:: good
+
+.. contents::
+   :local:
+
+==============
+OpenMP Support
+==============
+
+Clang fully supports OpenMP 4.5, almost all of 5.0 and most of 5.1/2.
+Clang supports offloading to X86_64, AArch64, PPC64[LE], NVIDIA GPUs (all models) and AMD GPUs (all models).
+
+In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
+Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.
+OMPT is also supported for NVIDIA and AMD GPUs.
+
+For the list of supported features from OpenMP 5.0 and 5.1
+see `OpenMP implementation details`_ and `OpenMP 51 implementation details`_.
+
+General improvements
+====================
+- New collapse clause scheme to avoid expensive remainder operations.
+  Compute loop index variables after collapsing a loop nest via the
+  collapse clause by replacing the expensive remainder operation with
+  multiplications and additions.
+
+- When using the collapse clause on a loop nest the default behavior
+  is to automatically extend the representation of the loop counter to
+  64 bits for the cases where the sizes of the collapsed loops are not
+  known at compile time. To prevent this conservative choice and use
+  at most 32 bits, compile your program with the
+  `-fopenmp-optimistic-collapse`.
+
+
+GPU devices support
+===================
+
+Data-sharing modes
+------------------
+
+Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda`
+modes. The default mode is `Generic`. `Cuda` mode can give an additional
+performance and can be activated using the `-fopenmp-cuda-mode` flag. In
+`Generic` mode all local variables that can be shared in the parallel regions
+are stored in the global memory. In `Cuda` mode local variables are not shared
+between the threads and it is user responsibility to share the required data
+between the threads in the parallel regions. Often, the optimizer is able to
+reduce the cost of `Generic` mode to the level of `Cuda` mode, but the flag,
+as well as other assumption flags, can be used for tuning.
+
+Features not supported or with limited support for Cuda devices
+---------------------------------------------------------------
+
+- Cancellation constructs are not supported.
+
+- Doacross loop nest is not supported.
+
+- User-defined reductions are supported only for trivial types.
+
+- Nested parallelism: inner parallel regions are executed sequentially.
+
+- Debug information for OpenMP target regions is supported, but sometimes it may
+  be required to manually specify the address class of the inspected variables.
+  In some cases the local variables are actually allocated in the global memory,
+  but the debug info may be not aware of it.
+
+
+.. _OpenMP implementation details:
+
+OpenMP 5.0 Implementation Details
+=================================
+
+The following table provides a quick overview over various OpenMP 5.0 features
+and their implementation status. Please post on the
+`Discourse forums (Runtimes - OpenMP category)`_ for more
+information or if you want to help with the
+implementation.
+
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+|Category                      | Feature                                                      | Status                   | Reviews                                                               |
++==============================+==============================================================+==========================+=======================================================================+
+| loop                         | support != in the canonical loop form                        | :good:`done`             | D54441                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | #pragma omp loop (directive)                                 | :part:`partial`          | D145823 (combined forms)                                              |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | #pragma omp loop bind                                        | :part:`worked on`        | D144634 (needs review)                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | collapse imperfectly nested loop                             | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | collapse non-rectangular nested loop                         | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | C++ range-base for loop                                      | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | clause: if for SIMD directives                               | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | inclusive scan (matching C++17 PSTL)                         | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory management            | memory allocators                                            | :good:`done`             | r341687,r357929                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory management            | allocate directive and allocate clause                       | :good:`done`             | r355614,r335952                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| OMPD                         | OMPD interfaces                                              | :good:`done`             | https://reviews.llvm.org/D99914   (Supports only HOST(CPU) and Linux  |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| OMPT                         | OMPT interfaces (callback support)                           | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| thread affinity              | thread affinity                                              | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | taskloop reduction                                           | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | task affinity                                                | :part:`not upstream`     | https://github.com/jklinkenberg/openmp/tree/task-affinity             |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | clause: depend on the taskwait construct                     | :good:`done`             | D113540 (regular codegen only)                                        |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | depend objects and detachable tasks                          | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | mutexinoutset dependence-type for tasks                      | :good:`done`             | D53380,D57576                                                         |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | combined taskloop constructs                                 | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | master taskloop                                              | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | parallel master taskloop                                     | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | master taskloop simd                                         | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | parallel master taskloop simd                                | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| SIMD                         | atomic and simd constructs inside SIMD code                  | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| SIMD                         | SIMD nontemporal                                             | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | infer target functions from initializers                     | :part:`worked on`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | infer target variables from initializers                     | :good:`done`             | D146418                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | OMP_TARGET_OFFLOAD environment variable                      | :good:`done`             | D50522                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | support full 'defaultmap' functionality                      | :good:`done`             | D69204                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | device specific functions                                    | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: device_type                                          | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: extended device                                      | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: uses_allocators clause                               | :good:`done`             | https://github.com/llvm/llvm-project/pull/157025                      |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: in_reduction                                         | :part:`worked on`        | r308768                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | omp_get_device_num()                                         | :good:`done`             | D54342,D128347                                                        |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | structure mapping of references                              | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | nested target declare                                        | :good:`done`             | D51378                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | implicitly map 'this' (this[:1])                             | :good:`done`             | D55982                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | allow access to the reference count (omp_target_is_present)  | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | requires directive                                           | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: unified_shared_memory                                | :good:`done`             | D52625,D52359                                                         |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: unified_address                                      | :part:`partial`          |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: reverse_offload                                      | :part:`partial`          | D52780,D155003                                                        |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: atomic_default_mem_order                             | :good:`done`             | D53513                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: dynamic_allocators                                   | :part:`unclaimed parts`  | D53079                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | user-defined mappers                                         | :good:`done`             | D56326,D58638,D58523,D58074,D60972,D59474                             |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | map array-section with implicit mapper                       | :good:`done`             |  https://github.com/llvm/llvm-project/pull/101101                     |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | mapping lambda expression                                    | :good:`done`             | D51107                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | clause: use_device_addr for target data                      | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | support close modifier on map clause                         | :good:`done`             | D55719,D55892                                                         |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | teams construct on the host device                           | :good:`done`             | r371553                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | support non-contiguous array sections for target update      | :good:`done`             | https://github.com/llvm/llvm-project/pull/144635                      |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | pointer attachment                                           | :part:`being repaired`   | @abhinavgaba (https://github.com/llvm/llvm-project/pull/153683)       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| atomic                       | hints for the atomic construct                               | :good:`done`             | D51233                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| base language                | C11 support                                                  | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| base language                | C++11/14/17 support                                          | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| base language                | lambda support                                               | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | array shaping                                                | :good:`done`             | D74144                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | library shutdown (omp_pause_resource[_all])                  | :good:`done`             | D55078                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | metadirectives                                               | :part:`mostly done`      | D91944, https://github.com/llvm/llvm-project/pull/128640              |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | conditional modifier for lastprivate clause                  | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | iterator and multidependences                                | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | depobj directive and depobj dependency kind                  | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | user-defined function variants                               | :good:`done`.            | D67294, D64095, D71847, D71830, D109635                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | pointer/reference to pointer based array reductions          | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | prevent new type definitions in clauses                      | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory model                 | memory model update (seq_cst, acq_rel, release, acquire,...) | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+
+
+.. _OpenMP 51 implementation details:
+
+OpenMP 5.1 Implementation Details
+=================================
+
+The following table provides a quick overview over various OpenMP 5.1 features
+and their implementation status.
+Please post on the
+`Discourse forums (Runtimes - OpenMP category)`_ for more
+information or if you want to help with the
+implementation.
+
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+|Category                      | Feature                                                      | Status                   | Reviews                                                               |
++==============================+==============================================================+==========================+=======================================================================+
+| atomic                       | 'compare' clause on atomic construct                         | :good:`done`             | D120290, D120007, D118632, D120200, D116261, D118547, D116637         |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| atomic                       | 'fail' clause on atomic construct                            | :part:`worked on`        | D123235 (in progress)                                                 |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| base language                | C++ attribute specifier syntax                               | :good:`done`             | D105648                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | 'present' map type modifier                                  | :good:`done`             | D83061, D83062, D84422                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | 'present' motion modifier                                    | :good:`done`             | D84711, D84712                                                        |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | 'present' in defaultmap clause                               | :good:`done`             | D92427                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | map clause reordering based on 'present' modifier            | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | device-specific environment variables                        | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | omp_target_is_accessible routine                             | :good:`done`             | https://github.com/llvm/llvm-project/pull/138294                      |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | omp_get_mapped_ptr routine                                   | :good:`done`             | D141545                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | new async target memory copy routines                        | :good:`done`             | D136103                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | thread_limit clause on target construct                      | :part:`partial`          | D141540 (offload), D152054 (host, in progress)                        |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | has_device_addr clause on target construct                   | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | iterators in map clause or motion clauses                    | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | indirect clause on declare target directive                  | :part:`In Progress`      |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | allow virtual functions calls for mapped object on device    | :part:`partial`          |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | interop construct                                            | :part:`partial`          | parsing/sema done: D98558, D98834, D98815                             |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| device                       | assorted routines for querying interoperable properties      | :part:`partial`          | D106674                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | Loop tiling transformation                                   | :good:`done`             | D76342                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | Loop unrolling transformation                                | :good:`done`             | D99459                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| loop                         | 'reproducible'/'unconstrained' modifiers in 'order' clause   | :part:`partial`          | D127855                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory management            | alignment for allocate directive and clause                  | :good:`done`             | D115683                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory management            | 'allocator' modifier for allocate clause                     | :good:`done`             | https://github.com/llvm/llvm-project/pull/114883                      |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory management            | 'align' modifier for allocate clause                         | :good:`done`             | https://github.com/llvm/llvm-project/pull/121814                      |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory management            | new memory management routines                               | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory management            | changes to omp_alloctrait_key enum                           | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| memory model                 | seq_cst clause on flush construct                            | :good:`done`             | https://github.com/llvm/llvm-project/pull/114072                      |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | 'omp_all_memory' keyword and use in 'depend' clause          | :good:`done`             | D125828, D126321                                                      |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | error directive                                              | :good:`done`             | D139166                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | scope construct                                              | :good:`done`             | D157933, https://github.com/llvm/llvm-project/pull/109197             |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | routines for controlling and querying team regions           | :part:`partial`          | D95003 (libomp only)                                                  |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | changes to ompt_scope_endpoint_t enum                        | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | omp_display_env routine                                      | :good:`done`             | D74956                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | extended OMP_PLACES syntax                                   | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars            | :good:`done`             | D138769                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | 'target_device' selector in context specifier                | :none:`worked on`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | begin/end declare variant                                    | :good:`done`             | D71179                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | dispatch construct and function variant argument adjustment  | :part:`worked on`        | D99537, D99679                                                        |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | assumes directives                                           | :part:`worked on`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | assume directive                                             | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | nothing directive                                            | :good:`done`             | D123286                                                               |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | masked construct and related combined constructs             | :good:`done`             | D99995, D100514, PR-121741(parallel_masked_taskloop)                  |
+|                              |                                                              |                          | PR-121746(parallel_masked_task_loop_simd),PR-121914(masked_taskloop)  |
+|                              |                                                              |                          | PR-121916(masked_taskloop_simd)                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| misc                         | default(firstprivate) & default(private)                     | :good:`done`             | D75591 (firstprivate), D125912 (private)                              |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| other                        | deprecating master construct                                 | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| OMPT                         | new barrier types added to ompt_sync_region_t enum           | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| OMPT                         | async data transfers added to ompt_target_data_op_t enum     | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| OMPT                         | new barrier state values added to ompt_state_t enum          | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| OMPT                         | new 'emi' callbacks for external monitoring interfaces       | :good:`done`             |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| OMPT                         | device tracing interface                                     | :none:`in progress`      | jplehr                                                                |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | 'strict' modifier for taskloop construct                     | :none:`unclaimed`        |                                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | inoutset in depend clause                                    | :good:`done`             | D97085, D118383                                                       |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+| task                         | nowait clause on taskwait                                    | :part:`partial`          | parsing/sema done: D131830, D141531                                   |
++------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
+
+
+.. _OpenMP 5.2 implementation details:
+
+OpenMP 5.2 Implementation Details
+=================================
+
+The following table provides a quick overview of various OpenMP 5.2 features
+and their implementation status. Please post on the
+`Discourse forums (Runtimes - OpenMP category)`_ for more
+information or if you want to help with the
+implementation.
+
+
+
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+|Feature                                                      | C/C++ Status              |  Fortran Status           | Reviews                                                                  |
++=============================================================+===========================+===========================+==========================================================================+
+| omp_in_explicit_task()                                      | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| semantics of explicit_task_var and implicit_task_var        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| ompx sentinel for C/C++ directive extensions                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| ompx prefix for clause extensions                           | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| if clause on teams construct                                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| step modifier added                                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| declare mapper: Add iterator modifier on map clause         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| declare mapper: Add iterator modifier on map clause         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| memspace and traits modifiers to uses allocator         i   | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Add otherwise clause to metadirectives                      | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| doacross clause with support for omp_cur_iteration          | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| position of interop_type in init clause on iterop           | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| implicit map type for target enter/exit data                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| work OMPT type for work-sharing loop constructs             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| allocate and firstprivate on scope directive                | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Change loop consistency for order clause                    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Add memspace and traits modifiers to uses_allocators        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Keep original base pointer on map w/o matched candidate     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Pure procedure support for certain directives               | :none:`N/A`               | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| ALLOCATE statement support for allocators                   | :none:`N/A`               | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| dispatch construct extension to support end directive       | :none:`N/A`               | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+
+
+
+.. _OpenMP 5.2 Deprecations:
+
+OpenMP 5.2 Deprecations
+=======================
+
+
+
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+|                                                             | C/C++ Status              |  Fortran Status           | Reviews                                                                  |
++=============================================================+===========================+===========================+==========================================================================+
+| Linear clause syntax                                        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| The minus operator                                          | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Map clause modifiers without commas                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| The use of allocate directives with ALLOCATE statement      | :good:`N/A`               | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| uses_allocators list syntax                                 | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| The default clause on metadirectives                        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| The delimited form of the declare target directive          | :none:`unclaimed`         | :good:`N/A`               |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| The use of the to clause on the declare target directive    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| The syntax of the destroy clause on the depobj construct    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| keyword source and sink as task-dependence modifiers        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| interop types in any position on init clause of interop     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| ompd prefix usage for some ICVs                             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+
+.. _OpenMP 6.0 implementation details:
+
+OpenMP 6.0 Implementation Details
+=================================
+
+The following table provides a quick overview of various OpenMP 6.0 features
+and their implementation status. Please post on the
+`Discourse forums (Runtimes - OpenMP category)`_ for more
+information or if you want to help with the
+implementation.
+
+
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+|Feature                                                      | C/C++ Status              |  Fortran Status           | Reviews                                                                  |
++=============================================================+===========================+===========================+==========================================================================+
+| free-agent threads                                          | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| threadset clause                                            | :part:`partial`           | :none:`unclaimed`         | Parse/Sema/Codegen : https://github.com/llvm/llvm-project/pull/13580     |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Recording of task graphs                                    | :part:`in progress`       | :part:`in progress`       | clang: jtb20, flang: kparzysz                                            |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Parallel inductions                                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| init_complete for scan directive                            | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop interchange transformation                             | :good:`done`              | :none:`unclaimed`         | Clang (interchange): https://github.com/llvm/llvm-project/pull/93022     |
+|                                                             |                           |                           | Clang (permutation): https://github.com/llvm/llvm-project/pull/92030     |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop reverse transformation                                 | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/92916                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop stripe transformation                                  | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/119891                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop fusion transformation                                  | :part:`in progress`       | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/139293                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop index set splitting transformation                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop transformation apply clause                            | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop fuse transformation                                    | :good:`done`              | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| workdistribute construct                                    |                           | :none:`in progress`       | @skc7, @mjklemm                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| task_iteration                                              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| memscope clause for atomic and flush                        | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| transparent clause (hull tasks)                             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| rule-based compound directives                              | :part:`In Progress`       | :part:`In Progress`       | kparzysz                                                                 |
+|                                                             |                           |                           | Testing for Fortran missing                                              |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| C23, C++23                                                  | :none:`unclaimed`         |                           |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Fortran 2023                                                |                           | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| decl attribute for declarative directives                   | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| C attribute syntax                                          | :none:`unclaimed`         |                           |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| pure directives in DO CONCURRENT                            |                           | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Optional argument for all clauses                           | :none:`partial`           | :none:`In Progress`       | Parse/Sema (nowait): https://github.com/llvm/llvm-project/pull/159628    |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Function references for locator list items                  | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| All clauses accept directive name modifier                  | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Extensions to depobj construct                              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Extensions to atomic construct                              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Private reductions                                          | :good:`mostly`            | :none:`unclaimed`         | Parse/Sema:https://github.com/llvm/llvm-project/pull/129938              |
+|                                                             |                           |                           | Codegen: https://github.com/llvm/llvm-project/pull/134709                |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Self maps                                                   | :part:`partial`           | :none:`unclaimed`         | parsing/sema done: https://github.com/llvm/llvm-project/pull/129888      |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Release map type for declare mapper                         | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Extensions to interop construct                             | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| no_openmp_constructs                                        | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125933                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| safe_sync and progress with identifier and API              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| OpenMP directives in concurrent loop regions                | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125621                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| atomics constructs on concurrent loop regions               | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125621                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Loop construct with DO CONCURRENT                           |                           | :part:`In Progress`       |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| device_type clause for target construct                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| nowait for ancestor target directives                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| New API for devices' num_teams/thread_limit                 | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Host and device environment variables                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| num_threads ICV and clause accepts list                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Numeric names for environment variables                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Increment between places for OMP_PLACES                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| OMP_AVAILABLE_DEVICES envirable                             | :none:`unclaimed`         | :none:`unclaimed`         | (should wait for "Traits for default device envirable" being done)       |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Traits for default device envirable                         | :part:`in progress`       | :none:`unclaimed`         | ro-i                                                                     |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Optionally omit array length expression                     | :good:`done`              | :none:`unclaimed`         | (Parse) https://github.com/llvm/llvm-project/pull/148048,                |
+|                                                             |                           |                           | (Sema) https://github.com/llvm/llvm-project/pull/152786                  |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Canonical loop sequences                                    | :part:`in progress`       | :part:`in progress`       | Clang: https://github.com/llvm/llvm-project/pull/139293                  |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Clarifications to Fortran map semantics                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| default clause at target construct                          | :part:`In Progress`       | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| ref count update use_device_{ptr, addr}                     | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Clarifications to implicit reductions                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| ref modifier for map clauses                                | :part:`In Progress`       | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| map-type modifiers in arbitrary position                    | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/90499                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Lift nesting restriction on concurrent loop                 | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/125621                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| priority clause for target constructs                       | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| changes to target_data construct                            | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Non-const do_not_sync for nowait/nogroup                    | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| need_device_addr modifier for adjust_args clause            | :part:`partial`           | :none:`unclaimed`         | Parsing/Sema: https://github.com/llvm/llvm-project/pull/143442           |
+|                                                             |                           |                           |               https://github.com/llvm/llvm-project/pull/149586           |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Prescriptive num_threads                                    | :good:`done`              | :none:`unclaimed`         |  https://github.com/llvm/llvm-project/pull/160659                        |
+|                                                             |                           |                           |  https://github.com/llvm/llvm-project/pull/146403                        |
+|                                                             |                           |                           |  https://github.com/llvm/llvm-project/pull/146404                        |
+|                                                             |                           |                           |  https://github.com/llvm/llvm-project/pull/146405                        |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Message and severity clauses                                | :good:`done`              | :none:`unclaimed`         |  https://github.com/llvm/llvm-project/pull/146093                        |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Local clause on declare target                              | :part:`In Progress`       | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| groupprivate directive                                      | :part:`In Progress`       | :part:`partial`           | Flang: kparzysz, mjklemm                                                 |
+|                                                             |                           |                           |                                                                          |
+|                                                             |                           |                           | Flang parser: https://github.com/llvm/llvm-project/pull/153807           |
+|                                                             |                           |                           | Flang sema: https://github.com/llvm/llvm-project/pull/154779             |
+|                                                             |                           |                           | Clang parse/sema: https://github.com/llvm/llvm-project/pull/158134       |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| variable-category on default clause                         | :good:`done`              | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| Changes to omp_target_is_accessible                         | :part:`In Progress`       | :part:`In Progress`       |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| defaultmap implicit-behavior 'storage'                      | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/158336                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| defaultmap implicit-behavior 'private'                      | :good:`done`              | :none:`unclaimed`         | https://github.com/llvm/llvm-project/pull/158712                         |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+
+.. _OpenMP 6.1 implementation details:
+
+OpenMP 6.1 Implementation Details (Experimental)
+================================================
+
+The following table provides a quick overview over various OpenMP 6.1 features
+and their implementation status. Since OpenMP 6.1 has not yet been released, the
+following features are experimental and are subject to change at any time.
+Please post on the `Discourse forums (Runtimes - OpenMP category)`_ for more
+information or if you want to help with the
+implementation.
+
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+|Feature                                                      | C/C++ Status              | Fortran Status            | Reviews                                                                  |
++=============================================================+===========================+===========================+==========================================================================+
+| dyn_groupprivate clause                                     | :part:`In Progress`       | :part:`In Progress`       | C/C++: kevinsala (https://github.com/llvm/llvm-project/pull/152651       |
+|                                                             |                           |                           | https://github.com/llvm/llvm-project/pull/152830                         |
+|                                                             |                           |                           | https://github.com/llvm/llvm-project/pull/152831)                        |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop flatten transformation                                 | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop grid/tile modifiers for sizes clause                   | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| attach map-type modifier                                    | :part:`In Progress`       | :none:`unclaimed`         | C/C++: @abhinavgaba;                                                     |
+|                                                             |                           |                           | RT: @abhinavgaba (https://github.com/llvm/llvm-project/pull/149036,      |
+|                                                             |                           |                           | https://github.com/llvm/llvm-project/pull/158370)                        |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+
+
+OpenMP Extensions
+=================
+
+The following table provides a quick overview over various OpenMP
+extensions and their implementation status.  These extensions are not
+currently defined by any standard, so links to associated LLVM
+documentation are provided.  As these extensions mature, they will be
+considered for standardization. Please post on the
+`Discourse forums (Runtimes - OpenMP category)`_ to provide feedback.
+
++------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
+|Category                      | Feature                                                                           | Status                   | Reviews                                                |
++==============================+===================================================================================+==========================+========================================================+
+| atomic extension             | `'atomic' strictly nested within 'teams'                                          | :good:`prototyped`       | D126323                                                |
+|                              | <https://openmp.llvm.org/docs/openacc/OpenMPExtensions.html#atomicWithinTeams>`_  |                          |                                                        |
++------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
+| device extension             | `'ompx_hold' map type modifier                                                    | :good:`prototyped`       | D106509, D106510                                       |
+|                              | <https://openmp.llvm.org/docs/openacc/OpenMPExtensions.html#ompx-hold>`_          |                          |                                                        |
++------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
+| device extension             | `'ompx_bare' clause on 'target teams' construct                                   | :good:`prototyped`       | #66844, #70612                                         |
+|                              | <https://www.osti.gov/servlets/purl/2205717>`_                                    |                          |                                                        |
++------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
+| device extension             | Multi-dim 'num_teams' and 'thread_limit' clause on 'target teams ompx_bare'       | :good:`partial`          | #99732, #101407, #102715                               |
+|                              | construct                                                                         |                          |                                                        |
++------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
+
+.. _Discourse forums (Runtimes - OpenMP category): https://discourse.llvm.org/c/runtimes/openmp/35
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index df84d14c90821..90821be4c53f0 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -915,6 +915,7 @@ OpenMP Support
   modifier in the ``adjust_args`` clause.
 - Allow array length to be omitted in array section subscript expression.
 - Fixed non-contiguous strided update in the ``omp target update`` directive with the ``from`` clause.
+- Added support for threadset clause in task and taskloop directives.
 - Properly handle array section/assumed-size array privatization in C/C++.
 - Added support to handle new syntax of the ``uses_allocators`` clause.
 - Added support for ``variable-category`` modifier in ``default clause``.
diff --git a/clang/include/clang/AST/OpenMPClause.h b/clang/include/clang/AST/OpenMPClause.h
index bc791e46e7c92..4f507485968cd 100644
--- a/clang/include/clang/AST/OpenMPClause.h
+++ b/clang/include/clang/AST/OpenMPClause.h
@@ -1424,6 +1424,86 @@ class OMPDefaultClause : public OMPClause {
   }
 };
 
+/// This represents 'threadset' clause in the '#pragma omp task ...' directive.
+///
+/// \code
+/// #pragma omp task threadset(omp_pool)
+/// \endcode
+/// In this example directive '#pragma omp task' has simple 'threadset'
+/// clause with kind 'omp_pool'.
+class OMPThreadsetClause final : public OMPClause {
+  friend class OMPClauseReader;
+
+  /// Location of '('.
+  SourceLocation LParenLoc;
+
+  /// A kind of the 'threadset' clause.
+  OpenMPThreadsetKind Kind = OMPC_THREADSET_unknown;
+
+  /// Start location of the kind in source code.
+  SourceLocation KindLoc;
+
+  /// Set kind of the clauses.
+  ///
+  /// \param K Argument of clause.
+  void setThreadsetKind(OpenMPThreadsetKind K) { Kind = K; }
+
+  /// Set argument location.
+  ///
+  /// \param KLoc Argument location.
+  void setThreadsetKindLoc(SourceLocation KLoc) { KindLoc = KLoc; }
+
+public:
+  /// Build 'threadset' clause with argument \a A ('omp_team' or 'omp_pool').
+  ///
+  /// \param A Argument of the clause ('omp_team' or 'omp_pool').
+  /// \param ALoc Starting location of the argument.
+  /// \param StartLoc Starting location of the clause.
+  /// \param LParenLoc Location of '('.
+  /// \param EndLoc Ending location of the clause.
+  OMPThreadsetClause(OpenMPThreadsetKind A, SourceLocation ALoc,
+                     SourceLocation StartLoc, SourceLocation LParenLoc,
+                     SourceLocation EndLoc)
+      : OMPClause(llvm::omp::OMPC_threadset, StartLoc, EndLoc),
+        LParenLoc(LParenLoc), Kind(A), KindLoc(ALoc) {}
+
+  /// Build an empty clause.
+  OMPThreadsetClause()
+      : OMPClause(llvm::omp::OMPC_threadset, SourceLocation(),
+                  SourceLocation()) {}
+
+  /// Sets the location of '('.
+  void setLParenLoc(SourceLocation Loc) { LParenLoc = Loc; }
+
+  /// Returns the location of '('.
+  SourceLocation getLParenLoc() const { return LParenLoc; }
+
+  /// Returns kind of the clause.
+  OpenMPThreadsetKind getThreadsetKind() const { return Kind; }
+
+  /// Returns location of clause kind.
+  SourceLocation getThreadsetKindLoc() const { return KindLoc; }
+
+  child_range children() {
+    return child_range(child_iterator(), child_iterator());
+  }
+
+  const_child_range children() const {
+    return const_child_range(const_child_iterator(), const_child_iterator());
+  }
+
+  child_range used_children() {
+    return child_range(child_iterator(), child_iterator());
+  }
+  const_child_range used_children() const {
+    return const_child_range(const_child_iterator(), const_child_iterator());
+  }
+
+  static bool classof(const OMPClause *T) {
+    return T->getClauseKind() == llvm::omp::OMPC_threadset;
+  }
+};
+
 /// This represents 'proc_bind' clause in the '#pragma omp ...'
 /// directive.
 ///
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index 32b2b6bdb989c..8cb0a657023b4 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -3523,6 +3523,12 @@ bool RecursiveASTVisitor<Derived>::VisitOMPDefaultClause(OMPDefaultClause *) {
   return true;
 }
 
+template <typename Derived>
+bool RecursiveASTVisitor<Derived>::VisitOMPThreadsetClause(
+    OMPThreadsetClause *) {
+  return true;
+}
+
 template <typename Derived>
 bool RecursiveASTVisitor<Derived>::VisitOMPProcBindClause(OMPProcBindClause *) {
   return true;
diff --git a/clang/include/clang/Basic/OpenMPKinds.def b/clang/include/clang/Basic/OpenMPKinds.def
index 202d06fa1fcaa..328a0747a82a8 100644
--- a/clang/include/clang/Basic/OpenMPKinds.def
+++ b/clang/include/clang/Basic/OpenMPKinds.def
@@ -98,6 +98,9 @@
 #ifndef OPENMP_ALLOCATE_MODIFIER
 #define OPENMP_ALLOCATE_MODIFIER(Name)
 #endif
+#ifndef OPENMP_THREADSET_KIND
+#define OPENMP_THREADSET_KIND(Name)
+#endif
 
 // Static attributes for 'schedule' clause.
 OPENMP_SCHEDULE_KIND(static)
@@ -255,6 +258,9 @@ OPENMP_DOACROSS_MODIFIER(sink)
 OPENMP_DOACROSS_MODIFIER(sink_omp_cur_iteration)
 OPENMP_DOACROSS_MODIFIER(source_omp_cur_iteration)
 
+OPENMP_THREADSET_KIND(omp_pool)
+OPENMP_THREADSET_KIND(omp_team)
+
 #undef OPENMP_NUMTASKS_MODIFIER
 #undef OPENMP_NUMTHREADS_MODIFIER
 #undef OPENMP_GRAINSIZE_MODIFIER
@@ -284,4 +290,4 @@ OPENMP_DOACROSS_MODIFIER(source_omp_cur_iteration)
 #undef OPENMP_DEFAULTMAP_MODIFIER
 #undef OPENMP_DOACROSS_MODIFIER
 #undef OPENMP_ALLOCATE_MODIFIER
-
+#undef OPENMP_THREADSET_KIND
diff --git a/clang/include/clang/Basic/OpenMPKinds.h b/clang/include/clang/Basic/OpenMPKinds.h
index ed89a31e2684b..c9ddbcd6d46c1 100644
--- a/clang/include/clang/Basic/OpenMPKinds.h
+++ b/clang/include/clang/Basic/OpenMPKinds.h
@@ -250,6 +250,13 @@ enum OpenMPAllocateClauseModifier {
   OMPC_ALLOCATE_unknown
 };
 
+/// OpenMP modifiers for 'threadset' clause.
+enum OpenMPThreadsetKind {
+#define OPENMP_THREADSET_KIND(Name) OMPC_THREADSET_##Name,
+#include "clang/Basic/OpenMPKinds.def"
+  OMPC_THREADSET_unknown
+};
+
 /// Number of allowed allocate-modifiers.
 static constexpr unsigned NumberOfOMPAllocateClauseModifiers =
     OMPC_ALLOCATE_unknown;
diff --git a/clang/include/clang/Sema/SemaOpenMP.h b/clang/include/clang/Sema/SemaOpenMP.h
index f9baeed03c347..ba12b403d9b9a 100644
--- a/clang/include/clang/Sema/SemaOpenMP.h
+++ b/clang/include/clang/Sema/SemaOpenMP.h
@@ -975,6 +975,12 @@ class SemaOpenMP : public SemaBase {
                            OpenMPDefaultClauseVariableCategory VCKind,
                            SourceLocation VCKindLoc, SourceLocation StartLoc,
                            SourceLocation LParenLoc, SourceLocation EndLoc);
+  /// Called on well-formed 'threadset' clause.
+  OMPClause *ActOnOpenMPThreadsetClause(OpenMPThreadsetKind Kind,
+                                        SourceLocation KindLoc,
+                                        SourceLocation StartLoc,
+                                        SourceLocation LParenLoc,
+                                        SourceLocation EndLoc);
   /// Called on well-formed 'proc_bind' clause.
   OMPClause *ActOnOpenMPProcBindClause(llvm::omp::ProcBindKind Kind,
                                        SourceLocation KindLoc,
diff --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp
index 791df7ee1c3d4..59d94590e04d1 100644
--- a/clang/lib/AST/OpenMPClause.cpp
+++ b/clang/lib/AST/OpenMPClause.cpp
@@ -124,6 +124,7 @@ const OMPClauseWithPreInit *OMPClauseWithPreInit::get(const OMPClause *C) {
   case OMPC_nowait:
   case OMPC_untied:
   case OMPC_mergeable:
+  case OMPC_threadset:
   case OMPC_threadprivate:
   case OMPC_groupprivate:
   case OMPC_flush:
@@ -2035,6 +2036,13 @@ void OMPClausePrinter::VisitOMPDefaultClause(OMPDefaultClause *Node) {
   OS << ")";
 }
 
+void OMPClausePrinter::VisitOMPThreadsetClause(OMPThreadsetClause *Node) {
+  OS << "threadset("
+     << getOpenMPSimpleClauseTypeName(OMPC_threadset,
+                                      unsigned(Node->getThreadsetKind()))
+     << ")";
+}
+
 void OMPClausePrinter::VisitOMPProcBindClause(OMPProcBindClause *Node) {
   OS << "proc_bind("
      << getOpenMPSimpleClauseTypeName(OMPC_proc_bind,
diff --git a/clang/lib/AST/StmtProfile.cpp b/clang/lib/AST/StmtProfile.cpp
index 05b64ccda0d01..c909e1bcecd38 100644
--- a/clang/lib/AST/StmtProfile.cpp
+++ b/clang/lib/AST/StmtProfile.cpp
@@ -546,6 +546,8 @@ void OMPClauseProfiler::VisitOMPNocontextClause(const OMPNocontextClause *C) {
 
 void OMPClauseProfiler::VisitOMPDefaultClause(const OMPDefaultClause *C) { }
 
+void OMPClauseProfiler::VisitOMPThreadsetClause(const OMPThreadsetClause *C) {}
+
 void OMPClauseProfiler::VisitOMPProcBindClause(const OMPProcBindClause *C) { }
 
 void OMPClauseProfiler::VisitOMPUnifiedAddressClause(
diff --git a/clang/lib/Basic/OpenMPKinds.cpp b/clang/lib/Basic/OpenMPKinds.cpp
index 769d92007d0c0..c36b6110980ae 100644
--- a/clang/lib/Basic/OpenMPKinds.cpp
+++ b/clang/lib/Basic/OpenMPKinds.cpp
@@ -210,6 +210,15 @@ unsigned clang::getOpenMPSimpleClauseType(OpenMPClauseKind Kind, StringRef Str,
 #define OPENMP_ALLOCATE_MODIFIER(Name) .Case(#Name, OMPC_ALLOCATE_##Name)
 #include "clang/Basic/OpenMPKinds.def"
         .Default(OMPC_ALLOCATE_unknown);
+  case OMPC_threadset: {
+    unsigned Type = llvm::StringSwitch<unsigned>(Str)
+#define OPENMP_THREADSET_KIND(Name) .Case(#Name, OMPC_THREADSET_##Name)
+#include "clang/Basic/OpenMPKinds.def"
+                        .Default(OMPC_THREADSET_unknown);
+    if (LangOpts.OpenMP < 60)
+      return OMPC_THREADSET_unknown;
+    return Type;
+  }
   case OMPC_num_threads: {
     unsigned Type = llvm::StringSwitch<unsigned>(Str)
 #define OPENMP_NUMTHREADS_MODIFIER(Name) .Case(#Name, OMPC_NUMTHREADS_##Name)
@@ -565,6 +574,16 @@ const char *clang::getOpenMPSimpleClauseTypeName(OpenMPClauseKind Kind,
 #include "clang/Basic/OpenMPKinds.def"
     }
     llvm_unreachable("Invalid OpenMP 'num_threads' clause modifier");
+  case OMPC_threadset:
+    switch (Type) {
+    case OMPC_THREADSET_unknown:
+      return "unknown";
+#define OPENMP_THREADSET_KIND(Name)                                            \
+  case OMPC_THREADSET_##Name:                                                  \
+    return #Name;
+#include "clang/Basic/OpenMPKinds.def"
+    }
+    llvm_unreachable("Invalid OpenMP 'threadset' clause modifier");
   case OMPC_unknown:
   case OMPC_threadprivate:
   case OMPC_groupprivate:
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index a95f6a783023f..7a6e807169e5a 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -1231,7 +1231,10 @@ llvm::DIType *CGDebugInfo::CreateType(const BuiltinType *BT) {
 }
 
 llvm::DIType *CGDebugInfo::CreateType(const BitIntType *Ty) {
-  StringRef Name = Ty->isUnsigned() ? "unsigned _BitInt" : "_BitInt";
+  SmallString<32> Name;
+  llvm::raw_svector_ostream OS(Name);
+  OS << (Ty->isUnsigned() ? "unsigned _BitInt(" : "_BitInt(")
+     << Ty->getNumBits() << ")";
   llvm::dwarf::TypeKind Encoding = Ty->isUnsigned()
                                        ? llvm::dwarf::DW_ATE_unsigned
                                        : llvm::dwarf::DW_ATE_signed;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a044168205d67..68a83561d7cb1 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -3778,6 +3778,7 @@ CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc,
     DestructorsFlag = 0x8,
     PriorityFlag = 0x20,
     DetachableFlag = 0x40,
+    FreeAgentFlag = 0x80,
   };
   unsigned Flags = Data.Tied ? TiedFlag : 0;
   bool NeedsCleanup = false;
@@ -3787,6 +3788,11 @@ CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc,
     if (NeedsCleanup)
       Flags = Flags | DestructorsFlag;
   }
+  if (const auto *Clause = D.getSingleClause<OMPThreadsetClause>()) {
+    OpenMPThreadsetKind Kind = Clause->getThreadsetKind();
+    if (Kind == OMPC_THREADSET_omp_pool)
+      Flags = Flags | FreeAgentFlag;
+  }
   if (Data.Priority.getInt())
     Flags = Flags | PriorityFlag;
   if (D.hasClausesOfKind<OMPDetachClause>())
diff --git a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
index 60f9b86333670..15fa78ddba715 100644
--- a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
@@ -1193,14 +1193,22 @@ static const ARMVectorIntrinsicInfo AArch64SISDIntrinsicMap[] = {
   NEONMAP1(vaddlvq_s32, aarch64_neon_saddlv, AddRetType | Add1ArgType),
   NEONMAP1(vaddlvq_u32, aarch64_neon_uaddlv, AddRetType | Add1ArgType),
   NEONMAP1(vaddv_f32, aarch64_neon_faddv, AddRetType | Add1ArgType),
-  NEONMAP1(vaddv_s32, aarch64_neon_saddv, AddRetType | Add1ArgType),
-  NEONMAP1(vaddv_u32, aarch64_neon_uaddv, AddRetType | Add1ArgType),
+  NEONMAP1(vaddv_s16, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddv_s32, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddv_s8, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddv_u16, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddv_u32, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddv_u8, vector_reduce_add, Add1ArgType),
   NEONMAP1(vaddvq_f32, aarch64_neon_faddv, AddRetType | Add1ArgType),
   NEONMAP1(vaddvq_f64, aarch64_neon_faddv, AddRetType | Add1ArgType),
-  NEONMAP1(vaddvq_s32, aarch64_neon_saddv, AddRetType | Add1ArgType),
-  NEONMAP1(vaddvq_s64, aarch64_neon_saddv, AddRetType | Add1ArgType),
-  NEONMAP1(vaddvq_u32, aarch64_neon_uaddv, AddRetType | Add1ArgType),
-  NEONMAP1(vaddvq_u64, aarch64_neon_uaddv, AddRetType | Add1ArgType),
+  NEONMAP1(vaddvq_s16, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddvq_s32, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddvq_s64, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddvq_s8, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddvq_u16, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddvq_u32, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddvq_u64, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vaddvq_u8, vector_reduce_add, Add1ArgType),
   NEONMAP1(vcaged_f64, aarch64_neon_facge, AddRetType | Add1ArgType),
   NEONMAP1(vcages_f32, aarch64_neon_facge, AddRetType | Add1ArgType),
   NEONMAP1(vcagtd_f64, aarch64_neon_facgt, AddRetType | Add1ArgType),
@@ -1243,27 +1251,43 @@ static const ARMVectorIntrinsicInfo AArch64SISDIntrinsicMap[] = {
   NEONMAP1(vmaxnmvq_f32, aarch64_neon_fmaxnmv, AddRetType | Add1ArgType),
   NEONMAP1(vmaxnmvq_f64, aarch64_neon_fmaxnmv, AddRetType | Add1ArgType),
   NEONMAP1(vmaxv_f32, aarch64_neon_fmaxv, AddRetType | Add1ArgType),
-  NEONMAP1(vmaxv_s32, aarch64_neon_smaxv, AddRetType | Add1ArgType),
-  NEONMAP1(vmaxv_u32, aarch64_neon_umaxv, AddRetType | Add1ArgType),
+  NEONMAP1(vmaxv_s16, vector_reduce_smax, Add1ArgType),
+  NEONMAP1(vmaxv_s32, vector_reduce_smax, Add1ArgType),
+  NEONMAP1(vmaxv_s8, vector_reduce_smax, Add1ArgType),
+  NEONMAP1(vmaxv_u16, vector_reduce_umax, Add1ArgType),
+  NEONMAP1(vmaxv_u32, vector_reduce_umax, Add1ArgType),
+  NEONMAP1(vmaxv_u8, vector_reduce_umax, Add1ArgType),
   NEONMAP1(vmaxvq_f32, aarch64_neon_fmaxv, AddRetType | Add1ArgType),
   NEONMAP1(vmaxvq_f64, aarch64_neon_fmaxv, AddRetType | Add1ArgType),
-  NEONMAP1(vmaxvq_s32, aarch64_neon_smaxv, AddRetType | Add1ArgType),
-  NEONMAP1(vmaxvq_u32, aarch64_neon_umaxv, AddRetType | Add1ArgType),
+  NEONMAP1(vmaxvq_s16, vector_reduce_smax, Add1ArgType),
+  NEONMAP1(vmaxvq_s32, vector_reduce_smax, Add1ArgType),
+  NEONMAP1(vmaxvq_s8, vector_reduce_smax, Add1ArgType),
+  NEONMAP1(vmaxvq_u16, vector_reduce_umax, Add1ArgType),
+  NEONMAP1(vmaxvq_u32, vector_reduce_umax, Add1ArgType),
+  NEONMAP1(vmaxvq_u8, vector_reduce_umax, Add1ArgType),
   NEONMAP1(vminnmv_f32, aarch64_neon_fminnmv, AddRetType | Add1ArgType),
   NEONMAP1(vminnmvq_f32, aarch64_neon_fminnmv, AddRetType | Add1ArgType),
   NEONMAP1(vminnmvq_f64, aarch64_neon_fminnmv, AddRetType | Add1ArgType),
   NEONMAP1(vminv_f32, aarch64_neon_fminv, AddRetType | Add1ArgType),
-  NEONMAP1(vminv_s32, aarch64_neon_sminv, AddRetType | Add1ArgType),
-  NEONMAP1(vminv_u32, aarch64_neon_uminv, AddRetType | Add1ArgType),
+  NEONMAP1(vminv_s16, vector_reduce_smin, Add1ArgType),
+  NEONMAP1(vminv_s32, vector_reduce_smin, Add1ArgType),
+  NEONMAP1(vminv_s8, vector_reduce_smin, Add1ArgType),
+  NEONMAP1(vminv_u16, vector_reduce_umin, Add1ArgType),
+  NEONMAP1(vminv_u32, vector_reduce_umin, Add1ArgType),
+  NEONMAP1(vminv_u8, vector_reduce_umin, Add1ArgType),
   NEONMAP1(vminvq_f32, aarch64_neon_fminv, AddRetType | Add1ArgType),
   NEONMAP1(vminvq_f64, aarch64_neon_fminv, AddRetType | Add1ArgType),
-  NEONMAP1(vminvq_s32, aarch64_neon_sminv, AddRetType | Add1ArgType),
-  NEONMAP1(vminvq_u32, aarch64_neon_uminv, AddRetType | Add1ArgType),
+  NEONMAP1(vminvq_s16, vector_reduce_smin, Add1ArgType),
+  NEONMAP1(vminvq_s32, vector_reduce_smin, Add1ArgType),
+  NEONMAP1(vminvq_s8, vector_reduce_smin, Add1ArgType),
+  NEONMAP1(vminvq_u16, vector_reduce_umin, Add1ArgType),
+  NEONMAP1(vminvq_u32, vector_reduce_umin, Add1ArgType),
+  NEONMAP1(vminvq_u8, vector_reduce_umin, Add1ArgType),
   NEONMAP1(vmull_p64, aarch64_neon_pmull64, 0),
   NEONMAP1(vmulxd_f64, aarch64_neon_fmulx, Add1ArgType),
   NEONMAP1(vmulxs_f32, aarch64_neon_fmulx, Add1ArgType),
-  NEONMAP1(vpaddd_s64, aarch64_neon_uaddv, AddRetType | Add1ArgType),
-  NEONMAP1(vpaddd_u64, aarch64_neon_uaddv, AddRetType | Add1ArgType),
+  NEONMAP1(vpaddd_s64, vector_reduce_add, Add1ArgType),
+  NEONMAP1(vpaddd_u64, vector_reduce_add, Add1ArgType),
   NEONMAP1(vpmaxnmqd_f64, aarch64_neon_fmaxnmv, AddRetType | Add1ArgType),
   NEONMAP1(vpmaxnms_f32, aarch64_neon_fmaxnmv, AddRetType | Add1ArgType),
   NEONMAP1(vpmaxqd_f64, aarch64_neon_fmaxv, AddRetType | Add1ArgType),
@@ -7067,127 +7091,6 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Int = Intrinsic::bitreverse;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrbit");
   }
-  case NEON::BI__builtin_neon_vaddv_u8:
-    // FIXME: These are handled by the AArch64 scalar code.
-    usgn = true;
-    [[fallthrough]];
-  case NEON::BI__builtin_neon_vaddv_s8: {
-    Int = usgn ? Intrinsic::aarch64_neon_uaddv : Intrinsic::aarch64_neon_saddv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vaddv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vaddv_u16:
-    usgn = true;
-    [[fallthrough]];
-  case NEON::BI__builtin_neon_vaddv_s16: {
-    Int = usgn ? Intrinsic::aarch64_neon_uaddv : Intrinsic::aarch64_neon_saddv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 4);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vaddv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vaddvq_u8:
-    usgn = true;
-    [[fallthrough]];
-  case NEON::BI__builtin_neon_vaddvq_s8: {
-    Int = usgn ? Intrinsic::aarch64_neon_uaddv : Intrinsic::aarch64_neon_saddv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 16);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vaddv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vaddvq_u16:
-    usgn = true;
-    [[fallthrough]];
-  case NEON::BI__builtin_neon_vaddvq_s16: {
-    Int = usgn ? Intrinsic::aarch64_neon_uaddv : Intrinsic::aarch64_neon_saddv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vaddv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxv_u8: {
-    Int = Intrinsic::aarch64_neon_umaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxv_u16: {
-    Int = Intrinsic::aarch64_neon_umaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 4);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxvq_u8: {
-    Int = Intrinsic::aarch64_neon_umaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 16);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxvq_u16: {
-    Int = Intrinsic::aarch64_neon_umaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxv_s8: {
-    Int = Intrinsic::aarch64_neon_smaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxv_s16: {
-    Int = Intrinsic::aarch64_neon_smaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 4);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxvq_s8: {
-    Int = Intrinsic::aarch64_neon_smaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 16);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vmaxvq_s16: {
-    Int = Intrinsic::aarch64_neon_smaxv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
   case NEON::BI__builtin_neon_vmaxv_f16: {
     Int = Intrinsic::aarch64_neon_fmaxv;
     Ty = HalfTy;
@@ -7206,78 +7109,6 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
     return Builder.CreateTrunc(Ops[0], HalfTy);
   }
-  case NEON::BI__builtin_neon_vminv_u8: {
-    Int = Intrinsic::aarch64_neon_uminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vminv_u16: {
-    Int = Intrinsic::aarch64_neon_uminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 4);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vminvq_u8: {
-    Int = Intrinsic::aarch64_neon_uminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 16);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vminvq_u16: {
-    Int = Intrinsic::aarch64_neon_uminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vminv_s8: {
-    Int = Intrinsic::aarch64_neon_sminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vminv_s16: {
-    Int = Intrinsic::aarch64_neon_sminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 4);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
-  case NEON::BI__builtin_neon_vminvq_s8: {
-    Int = Intrinsic::aarch64_neon_sminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int8Ty, 16);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int8Ty);
-  }
-  case NEON::BI__builtin_neon_vminvq_s16: {
-    Int = Intrinsic::aarch64_neon_sminv;
-    Ty = Int32Ty;
-    VTy = llvm::FixedVectorType::get(Int16Ty, 8);
-    llvm::Type *Tys[2] = { Ty, VTy };
-    Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
-    return Builder.CreateTrunc(Ops[0], Int16Ty);
-  }
   case NEON::BI__builtin_neon_vminv_f16: {
     Int = Intrinsic::aarch64_neon_fminv;
     Ty = HalfTy;
diff --git a/clang/lib/Frontend/TextDiagnostic.cpp b/clang/lib/Frontend/TextDiagnostic.cpp
index f5add2a941f72..aea3e72d92a84 100644
--- a/clang/lib/Frontend/TextDiagnostic.cpp
+++ b/clang/lib/Frontend/TextDiagnostic.cpp
@@ -22,22 +22,16 @@
 
 using namespace clang;
 
-static const enum raw_ostream::Colors noteColor = raw_ostream::CYAN;
-static const enum raw_ostream::Colors remarkColor =
-  raw_ostream::BLUE;
-static const enum raw_ostream::Colors fixitColor =
-  raw_ostream::GREEN;
-static const enum raw_ostream::Colors caretColor =
-  raw_ostream::GREEN;
-static const enum raw_ostream::Colors warningColor =
-  raw_ostream::MAGENTA;
-static const enum raw_ostream::Colors templateColor =
-  raw_ostream::CYAN;
-static const enum raw_ostream::Colors errorColor = raw_ostream::RED;
-static const enum raw_ostream::Colors fatalColor = raw_ostream::RED;
+static constexpr raw_ostream::Colors NoteColor = raw_ostream::CYAN;
+static constexpr raw_ostream::Colors RemarkColor = raw_ostream::BLUE;
+static constexpr raw_ostream::Colors FixitColor = raw_ostream::GREEN;
+static constexpr raw_ostream::Colors CaretColor = raw_ostream::GREEN;
+static constexpr raw_ostream::Colors WarningColor = raw_ostream::MAGENTA;
+static constexpr raw_ostream::Colors TemplateColor = raw_ostream::CYAN;
+static constexpr raw_ostream::Colors ErrorColor = raw_ostream::RED;
+static constexpr raw_ostream::Colors FatalColor = raw_ostream::RED;
 // Used for changing only the bold attribute.
-static const enum raw_ostream::Colors savedColor =
-  raw_ostream::SAVEDCOLOR;
+static constexpr raw_ostream::Colors SavedColor = raw_ostream::SAVEDCOLOR;
 
 // Magenta is taken for 'warning'. Red is already 'error' and 'cyan'
 // is already taken for 'note'. Green is already used to underline
@@ -47,6 +41,43 @@ static constexpr raw_ostream::Colors CommentColor = raw_ostream::YELLOW;
 static constexpr raw_ostream::Colors LiteralColor = raw_ostream::GREEN;
 static constexpr raw_ostream::Colors KeywordColor = raw_ostream::BLUE;
 
+namespace {
+template <typename Sub> class ColumnsOrBytes {
+public:
+  int V = 0;
+  ColumnsOrBytes(int V) : V(V) {}
+  bool isValid() const { return V != -1; }
+  Sub next() const { return Sub(V + 1); }
+  Sub prev() const { return Sub(V - 1); }
+
+  bool operator>(Sub O) const { return V > O.V; }
+  bool operator<(Sub O) const { return V < O.V; }
+  bool operator<=(Sub B) const { return V <= B.V; }
+  bool operator!=(Sub C) const { return C.V != V; }
+
+  Sub operator+(Sub B) const { return Sub(V + B.V); }
+  Sub &operator+=(Sub B) {
+    V += B.V;
+    return *static_cast<Sub *>(this);
+  }
+  Sub operator-(Sub B) const { return Sub(V - B.V); }
+  Sub &operator-=(Sub B) {
+    V -= B.V;
+    return *static_cast<Sub *>(this);
+  }
+};
+
+class Bytes final : public ColumnsOrBytes<Bytes> {
+public:
+  Bytes(int V) : ColumnsOrBytes(V) {}
+};
+
+class Columns final : public ColumnsOrBytes<Columns> {
+public:
+  Columns(int V) : ColumnsOrBytes(V) {}
+};
+} // namespace
+
 /// Add highlights to differences in template strings.
 static void applyTemplateHighlighting(raw_ostream &OS, StringRef Str,
                                       bool &Normal, bool Bold) {
@@ -58,11 +89,11 @@ static void applyTemplateHighlighting(raw_ostream &OS, StringRef Str,
 
     Str = Str.substr(Pos + 1);
     if (Normal)
-      OS.changeColor(templateColor, true);
+      OS.changeColor(TemplateColor, true);
     else {
       OS.resetColor();
       if (Bold)
-        OS.changeColor(savedColor, true);
+        OS.changeColor(SavedColor, true);
     }
     Normal = !Normal;
   }
@@ -109,8 +140,8 @@ printableTextForNextCharacter(StringRef SourceLine, size_t *I,
   if (SourceLine[*I] == '\t') {
     assert(0 < TabStop && TabStop <= DiagnosticOptions::MaxTabStop &&
            "Invalid -ftabstop value");
-    unsigned Col = bytesSincePreviousTabOrLineBegin(SourceLine, *I);
-    unsigned NumSpaces = TabStop - (Col % TabStop);
+    unsigned LineBytes = bytesSincePreviousTabOrLineBegin(SourceLine, *I);
+    unsigned NumSpaces = TabStop - (LineBytes % TabStop);
     assert(0 < NumSpaces && NumSpaces <= TabStop
            && "Invalid computation of space amt");
     ++(*I);
@@ -220,97 +251,99 @@ static void expandTabs(std::string &SourceLine, unsigned TabStop) {
 ///  (\\u3042 is represented in UTF-8 by three bytes and takes two columns to
 ///   display)
 static void genColumnByteMapping(StringRef SourceLine, unsigned TabStop,
-                                 SmallVectorImpl<int> &BytesOut,
-                                 SmallVectorImpl<int> &ColumnsOut) {
+                                 SmallVectorImpl<Bytes> &BytesOut,
+                                 SmallVectorImpl<Columns> &ColumnsOut) {
   assert(BytesOut.empty());
   assert(ColumnsOut.empty());
 
   if (SourceLine.empty()) {
-    BytesOut.resize(1u, 0);
-    ColumnsOut.resize(1u, 0);
+    BytesOut.resize(1u, Bytes(0));
+    ColumnsOut.resize(1u, Columns(0));
     return;
   }
 
   ColumnsOut.resize(SourceLine.size() + 1, -1);
 
-  int Columns = 0;
+  Columns NumColumns = 0;
   size_t I = 0;
   while (I < SourceLine.size()) {
-    ColumnsOut[I] = Columns;
-    BytesOut.resize(Columns + 1, -1);
-    BytesOut.back() = I;
+    ColumnsOut[I] = NumColumns;
+    BytesOut.resize(NumColumns.V + 1, -1);
+    BytesOut.back() = Bytes(I);
     auto [Str, Printable] =
         printableTextForNextCharacter(SourceLine, &I, TabStop);
-    Columns += llvm::sys::locale::columnWidth(Str);
+    NumColumns += Columns(llvm::sys::locale::columnWidth(Str));
   }
 
-  ColumnsOut.back() = Columns;
-  BytesOut.resize(Columns + 1, -1);
-  BytesOut.back() = I;
+  ColumnsOut.back() = NumColumns;
+  BytesOut.resize(NumColumns.V + 1, -1);
+  BytesOut.back() = Bytes(I);
 }
 
 namespace {
 struct SourceColumnMap {
   SourceColumnMap(StringRef SourceLine, unsigned TabStop)
-  : m_SourceLine(SourceLine) {
+      : SourceLine(SourceLine) {
 
-    genColumnByteMapping(SourceLine, TabStop, m_columnToByte, m_byteToColumn);
+    genColumnByteMapping(SourceLine, TabStop, ColumnToByte, ByteToColumn);
 
-    assert(m_byteToColumn.size()==SourceLine.size()+1);
-    assert(0 < m_byteToColumn.size() && 0 < m_columnToByte.size());
-    assert(m_byteToColumn.size()
-           == static_cast<unsigned>(m_columnToByte.back()+1));
-    assert(static_cast<unsigned>(m_byteToColumn.back()+1)
-           == m_columnToByte.size());
+    assert(ByteToColumn.size() == SourceLine.size() + 1);
+    assert(0 < ByteToColumn.size() && 0 < ColumnToByte.size());
+    assert(ByteToColumn.size() ==
+           static_cast<unsigned>(ColumnToByte.back().V + 1));
+    assert(static_cast<unsigned>(ByteToColumn.back().V + 1) ==
+           ColumnToByte.size());
   }
-  int columns() const { return m_byteToColumn.back(); }
-  int bytes() const { return m_columnToByte.back(); }
+  Columns columns() const { return ByteToColumn.back(); }
+  Bytes bytes() const { return ColumnToByte.back(); }
 
   /// Map a byte to the column which it is at the start of, or return -1
   /// if it is not at the start of a column (for a UTF-8 trailing byte).
-  int byteToColumn(int n) const {
-    assert(0<=n && n<static_cast<int>(m_byteToColumn.size()));
-    return m_byteToColumn[n];
+  Columns byteToColumn(Bytes N) const {
+    assert(0 <= N.V && N.V < static_cast<int>(ByteToColumn.size()));
+    return ByteToColumn[N.V];
   }
 
   /// Map a byte to the first column which contains it.
-  int byteToContainingColumn(int N) const {
-    assert(0 <= N && N < static_cast<int>(m_byteToColumn.size()));
-    while (m_byteToColumn[N] == -1)
-      --N;
-    return m_byteToColumn[N];
+  Columns byteToContainingColumn(Bytes N) const {
+    assert(0 <= N.V && N.V < static_cast<int>(ByteToColumn.size()));
+    while (!ByteToColumn[N.V].isValid())
+      --N.V;
+    return ByteToColumn[N.V];
   }
 
   /// Map a column to the byte which starts the column, or return -1 if
   /// the column the second or subsequent column of an expanded tab or similar
   /// multi-column entity.
-  int columnToByte(int n) const {
-    assert(0<=n && n<static_cast<int>(m_columnToByte.size()));
-    return m_columnToByte[n];
+  Bytes columnToByte(Columns N) const {
+    assert(0 <= N.V && N.V < static_cast<int>(ColumnToByte.size()));
+    return ColumnToByte[N.V];
   }
 
   /// Map from a byte index to the next byte which starts a column.
-  int startOfNextColumn(int N) const {
-    assert(0 <= N && N < static_cast<int>(m_byteToColumn.size() - 1));
-    while (byteToColumn(++N) == -1) {}
+  Bytes startOfNextColumn(Bytes N) const {
+    assert(0 <= N.V && N.V < static_cast<int>(ByteToColumn.size() - 1));
+    N = N.next();
+    while (!byteToColumn(N).isValid())
+      N = N.next();
     return N;
   }
 
   /// Map from a byte index to the previous byte which starts a column.
-  int startOfPreviousColumn(int N) const {
-    assert(0 < N && N < static_cast<int>(m_byteToColumn.size()));
-    while (byteToColumn(--N) == -1) {}
+  Bytes startOfPreviousColumn(Bytes N) const {
+    assert(0 < N.V && N.V < static_cast<int>(ByteToColumn.size()));
+    N = N.prev();
+    while (!byteToColumn(N).isValid())
+      N = N.prev();
     return N;
   }
 
-  StringRef getSourceLine() const {
-    return m_SourceLine;
-  }
+  StringRef getSourceLine() const { return SourceLine; }
 
 private:
-  const std::string m_SourceLine;
-  SmallVector<int,200> m_byteToColumn;
-  SmallVector<int,200> m_columnToByte;
+  StringRef SourceLine;
+  SmallVector<Columns, 200> ByteToColumn;
+  SmallVector<Bytes, 200> ColumnToByte;
 };
 } // end anonymous namespace
 
@@ -319,14 +352,15 @@ struct SourceColumnMap {
 static void selectInterestingSourceRegion(std::string &SourceLine,
                                           std::string &CaretLine,
                                           std::string &FixItInsertionLine,
-                                          unsigned Columns,
-                                          const SourceColumnMap &map) {
-  unsigned CaretColumns = CaretLine.size();
-  unsigned FixItColumns = llvm::sys::locale::columnWidth(FixItInsertionLine);
-  unsigned MaxColumns = std::max(static_cast<unsigned>(map.columns()),
-                                 std::max(CaretColumns, FixItColumns));
+                                          Columns NonGutterColumns,
+                                          const SourceColumnMap &Map) {
+  Columns CaretColumns = Columns(CaretLine.size());
+  Columns FixItColumns =
+      Columns(llvm::sys::locale::columnWidth(FixItInsertionLine));
+  Columns MaxColumns =
+      std::max({Map.columns().V, CaretColumns.V, FixItColumns.V});
   // if the number of columns is less than the desired number we're done
-  if (MaxColumns <= Columns)
+  if (MaxColumns <= NonGutterColumns)
     return;
 
   // No special characters are allowed in CaretLine.
@@ -334,13 +368,13 @@ static void selectInterestingSourceRegion(std::string &SourceLine,
 
   // Find the slice that we need to display the full caret line
   // correctly.
-  unsigned CaretStart = 0, CaretEnd = CaretLine.size();
-  for (; CaretStart != CaretEnd; ++CaretStart)
-    if (!isWhitespace(CaretLine[CaretStart]))
+  Columns CaretStart = 0, CaretEnd = CaretLine.size();
+  for (; CaretStart != CaretEnd; CaretStart = CaretStart.next())
+    if (!isWhitespace(CaretLine[CaretStart.V]))
       break;
 
-  for (; CaretEnd != CaretStart; --CaretEnd)
-    if (!isWhitespace(CaretLine[CaretEnd - 1]))
+  for (; CaretEnd != CaretStart; CaretEnd = CaretEnd.prev())
+    if (!isWhitespace(CaretLine[CaretEnd.V - 1]))
       break;
 
   // caret has already been inserted into CaretLine so the above whitespace
@@ -349,39 +383,38 @@ static void selectInterestingSourceRegion(std::string &SourceLine,
   // If we have a fix-it line, make sure the slice includes all of the
   // fix-it information.
   if (!FixItInsertionLine.empty()) {
-    unsigned FixItStart = 0, FixItEnd = FixItInsertionLine.size();
-    for (; FixItStart != FixItEnd; ++FixItStart)
-      if (!isWhitespace(FixItInsertionLine[FixItStart]))
-        break;
-
-    for (; FixItEnd != FixItStart; --FixItEnd)
-      if (!isWhitespace(FixItInsertionLine[FixItEnd - 1]))
-        break;
-
     // We can safely use the byte offset FixItStart as the column offset
     // because the characters up until FixItStart are all ASCII whitespace
     // characters.
-    unsigned FixItStartCol = FixItStart;
-    unsigned FixItEndCol
-      = llvm::sys::locale::columnWidth(FixItInsertionLine.substr(0, FixItEnd));
-
-    CaretStart = std::min(FixItStartCol, CaretStart);
-    CaretEnd = std::max(FixItEndCol, CaretEnd);
+    Bytes FixItStart = 0;
+    Bytes FixItEnd = Bytes(FixItInsertionLine.size());
+    while (FixItStart != FixItEnd &&
+           isWhitespace(FixItInsertionLine[FixItStart.V]))
+      FixItStart = FixItStart.next();
+
+    while (FixItEnd != FixItStart &&
+           isWhitespace(FixItInsertionLine[FixItEnd.V - 1]))
+      FixItEnd = FixItEnd.prev();
+
+    Columns FixItStartCol = Columns(FixItStart.V);
+    Columns FixItEndCol = Columns(llvm::sys::locale::columnWidth(
+        FixItInsertionLine.substr(0, FixItEnd.V)));
+
+    CaretStart = std::min(FixItStartCol.V, CaretStart.V);
+    CaretEnd = std::max(FixItEndCol.V, CaretEnd.V);
   }
 
   // CaretEnd may have been set at the middle of a character
   // If it's not at a character's first column then advance it past the current
   //   character.
-  while (static_cast<int>(CaretEnd) < map.columns() &&
-         -1 == map.columnToByte(CaretEnd))
-    ++CaretEnd;
-
-  assert((static_cast<int>(CaretStart) > map.columns() ||
-          -1!=map.columnToByte(CaretStart)) &&
-         "CaretStart must not point to a column in the middle of a source"
-         " line character");
-  assert((static_cast<int>(CaretEnd) > map.columns() ||
-          -1!=map.columnToByte(CaretEnd)) &&
+  while (CaretEnd < Map.columns() && !Map.columnToByte(CaretEnd).isValid())
+    CaretEnd = CaretEnd.next();
+
+  assert(
+      (CaretStart > Map.columns() || Map.columnToByte(CaretStart).isValid()) &&
+      "CaretStart must not point to a column in the middle of a source"
+      " line character");
+  assert((CaretEnd > Map.columns() || Map.columnToByte(CaretEnd).isValid()) &&
          "CaretEnd must not point to a column in the middle of a source line"
          " character");
 
@@ -390,70 +423,69 @@ static void selectInterestingSourceRegion(std::string &SourceLine,
   // number of columns we have, try to grow the slice to encompass
   // more context.
 
-  unsigned SourceStart = map.columnToByte(std::min<unsigned>(CaretStart,
-                                                             map.columns()));
-  unsigned SourceEnd = map.columnToByte(std::min<unsigned>(CaretEnd,
-                                                           map.columns()));
+  Bytes SourceStart = Map.columnToByte(std::min(CaretStart.V, Map.columns().V));
+  Bytes SourceEnd = Map.columnToByte(std::min(CaretEnd.V, Map.columns().V));
 
-  unsigned CaretColumnsOutsideSource = CaretEnd-CaretStart
-    - (map.byteToColumn(SourceEnd)-map.byteToColumn(SourceStart));
+  Columns CaretColumnsOutsideSource =
+      CaretEnd - CaretStart -
+      (Map.byteToColumn(SourceEnd) - Map.byteToColumn(SourceStart));
 
-  char const *front_ellipse = "  ...";
-  char const *front_space   = "     ";
-  char const *back_ellipse = "...";
-  unsigned ellipses_space = strlen(front_ellipse) + strlen(back_ellipse);
+  constexpr StringRef FrontEllipse = "  ...";
+  constexpr StringRef FrontSpace = "     ";
+  constexpr StringRef BackEllipse = "...";
+  Columns EllipsesColumns = Columns(FrontEllipse.size() + BackEllipse.size());
 
-  unsigned TargetColumns = Columns;
+  Columns TargetColumns = NonGutterColumns;
   // Give us extra room for the ellipses
   //  and any of the caret line that extends past the source
-  if (TargetColumns > ellipses_space+CaretColumnsOutsideSource)
-    TargetColumns -= ellipses_space+CaretColumnsOutsideSource;
+  if (TargetColumns > EllipsesColumns + CaretColumnsOutsideSource)
+    TargetColumns -= EllipsesColumns + CaretColumnsOutsideSource;
 
-  while (SourceStart>0 || SourceEnd<SourceLine.size()) {
+  while (SourceStart > 0 || SourceEnd < SourceLine.size()) {
     bool ExpandedRegion = false;
 
-    if (SourceStart>0) {
-      unsigned NewStart = map.startOfPreviousColumn(SourceStart);
+    if (SourceStart > 0) {
+      Bytes NewStart = Map.startOfPreviousColumn(SourceStart);
 
       // Skip over any whitespace we see here; we're looking for
       // another bit of interesting text.
       // FIXME: Detect non-ASCII whitespace characters too.
-      while (NewStart && isWhitespace(SourceLine[NewStart]))
-        NewStart = map.startOfPreviousColumn(NewStart);
+      while (NewStart > 0 && isWhitespace(SourceLine[NewStart.V]))
+        NewStart = Map.startOfPreviousColumn(NewStart);
 
       // Skip over this bit of "interesting" text.
-      while (NewStart) {
-        unsigned Prev = map.startOfPreviousColumn(NewStart);
-        if (isWhitespace(SourceLine[Prev]))
+      while (NewStart > 0) {
+        Bytes Prev = Map.startOfPreviousColumn(NewStart);
+        if (isWhitespace(SourceLine[Prev.V]))
           break;
         NewStart = Prev;
       }
 
-      assert(map.byteToColumn(NewStart) != -1);
-      unsigned NewColumns = map.byteToColumn(SourceEnd) -
-                              map.byteToColumn(NewStart);
+      assert(Map.byteToColumn(NewStart).isValid());
+      Columns NewColumns =
+          Map.byteToColumn(SourceEnd) - Map.byteToColumn(NewStart);
       if (NewColumns <= TargetColumns) {
         SourceStart = NewStart;
         ExpandedRegion = true;
       }
     }
 
-    if (SourceEnd<SourceLine.size()) {
-      unsigned NewEnd = map.startOfNextColumn(SourceEnd);
+    if (SourceEnd < SourceLine.size()) {
+      Bytes NewEnd = Map.startOfNextColumn(SourceEnd);
 
       // Skip over any whitespace we see here; we're looking for
       // another bit of interesting text.
       // FIXME: Detect non-ASCII whitespace characters too.
-      while (NewEnd < SourceLine.size() && isWhitespace(SourceLine[NewEnd]))
-        NewEnd = map.startOfNextColumn(NewEnd);
+      while (NewEnd < SourceLine.size() && isWhitespace(SourceLine[NewEnd.V]))
+        NewEnd = Map.startOfNextColumn(NewEnd);
 
       // Skip over this bit of "interesting" text.
-      while (NewEnd < SourceLine.size() && isWhitespace(SourceLine[NewEnd]))
-        NewEnd = map.startOfNextColumn(NewEnd);
+      while (NewEnd < SourceLine.size() && isWhitespace(SourceLine[NewEnd.V]))
+        NewEnd = Map.startOfNextColumn(NewEnd);
 
-      assert(map.byteToColumn(NewEnd) != -1);
-      unsigned NewColumns = map.byteToColumn(NewEnd) -
-                              map.byteToColumn(SourceStart);
+      assert(Map.byteToColumn(NewEnd).isValid());
+      Columns NewColumns =
+          Map.byteToColumn(NewEnd) - Map.byteToColumn(SourceStart);
       if (NewColumns <= TargetColumns) {
         SourceEnd = NewEnd;
         ExpandedRegion = true;
@@ -464,39 +496,41 @@ static void selectInterestingSourceRegion(std::string &SourceLine,
       break;
   }
 
-  CaretStart = map.byteToColumn(SourceStart);
-  CaretEnd = map.byteToColumn(SourceEnd) + CaretColumnsOutsideSource;
+  CaretStart = Map.byteToColumn(SourceStart);
+  CaretEnd = Map.byteToColumn(SourceEnd) + CaretColumnsOutsideSource;
 
   // [CaretStart, CaretEnd) is the slice we want. Update the various
   // output lines to show only this slice.
-  assert(CaretStart!=(unsigned)-1 && CaretEnd!=(unsigned)-1 &&
-         SourceStart!=(unsigned)-1 && SourceEnd!=(unsigned)-1);
+  assert(CaretStart.isValid() && CaretEnd.isValid() && SourceStart.isValid() &&
+         SourceEnd.isValid());
   assert(SourceStart <= SourceEnd);
   assert(CaretStart <= CaretEnd);
 
-  unsigned BackColumnsRemoved
-    = map.byteToColumn(SourceLine.size())-map.byteToColumn(SourceEnd);
-  unsigned FrontColumnsRemoved = CaretStart;
-  unsigned ColumnsKept = CaretEnd-CaretStart;
+  Columns BackColumnsRemoved =
+      Map.byteToColumn(Bytes{static_cast<int>(SourceLine.size())}) -
+      Map.byteToColumn(SourceEnd);
+  Columns FrontColumnsRemoved = CaretStart;
+  Columns ColumnsKept = CaretEnd - CaretStart;
 
   // We checked up front that the line needed truncation
-  assert(FrontColumnsRemoved+ColumnsKept+BackColumnsRemoved > Columns);
+  assert(FrontColumnsRemoved + ColumnsKept + BackColumnsRemoved >
+         NonGutterColumns);
 
   // The line needs some truncation, and we'd prefer to keep the front
   //  if possible, so remove the back
-  if (BackColumnsRemoved > strlen(back_ellipse))
-    SourceLine.replace(SourceEnd, std::string::npos, back_ellipse);
+  if (BackColumnsRemoved > Columns(BackEllipse.size()))
+    SourceLine.replace(SourceEnd.V, std::string::npos, BackEllipse);
 
   // If that's enough then we're done
-  if (FrontColumnsRemoved+ColumnsKept <= Columns)
+  if (FrontColumnsRemoved + ColumnsKept <= Columns(NonGutterColumns))
     return;
 
   // Otherwise remove the front as well
-  if (FrontColumnsRemoved > strlen(front_ellipse)) {
-    SourceLine.replace(0, SourceStart, front_ellipse);
-    CaretLine.replace(0, CaretStart, front_space);
+  if (FrontColumnsRemoved > Columns(FrontEllipse.size())) {
+    SourceLine.replace(0, SourceStart.V, FrontEllipse);
+    CaretLine.replace(0, CaretStart.V, FrontSpace);
     if (!FixItInsertionLine.empty())
-      FixItInsertionLine.replace(0, CaretStart, front_space);
+      FixItInsertionLine.replace(0, CaretStart.V, FrontSpace);
   }
 }
 
@@ -690,11 +724,21 @@ TextDiagnostic::printDiagnosticLevel(raw_ostream &OS,
     switch (Level) {
     case DiagnosticsEngine::Ignored:
       llvm_unreachable("Invalid diagnostic type");
-    case DiagnosticsEngine::Note:    OS.changeColor(noteColor, true); break;
-    case DiagnosticsEngine::Remark:  OS.changeColor(remarkColor, true); break;
-    case DiagnosticsEngine::Warning: OS.changeColor(warningColor, true); break;
-    case DiagnosticsEngine::Error:   OS.changeColor(errorColor, true); break;
-    case DiagnosticsEngine::Fatal:   OS.changeColor(fatalColor, true); break;
+    case DiagnosticsEngine::Note:
+      OS.changeColor(NoteColor, true);
+      break;
+    case DiagnosticsEngine::Remark:
+      OS.changeColor(RemarkColor, true);
+      break;
+    case DiagnosticsEngine::Warning:
+      OS.changeColor(WarningColor, true);
+      break;
+    case DiagnosticsEngine::Error:
+      OS.changeColor(ErrorColor, true);
+      break;
+    case DiagnosticsEngine::Fatal:
+      OS.changeColor(FatalColor, true);
+      break;
     }
   }
 
@@ -722,7 +766,7 @@ void TextDiagnostic::printDiagnosticMessage(raw_ostream &OS,
   if (ShowColors && !IsSupplemental) {
     // Print primary diagnostic messages in bold and without color, to visually
     // indicate the transition from continuation notes and other output.
-    OS.changeColor(savedColor, true);
+    OS.changeColor(SavedColor, true);
     Bold = true;
   }
 
@@ -800,7 +844,7 @@ void TextDiagnostic::emitDiagnosticLoc(FullSourceLoc Loc, PresumedLoc PLoc,
     return;
 
   if (DiagOpts.ShowColors)
-    OS.changeColor(savedColor, true);
+    OS.changeColor(SavedColor, true);
 
   emitFilename(PLoc.getFilename(), Loc.getManager());
   switch (DiagOpts.getFormat()) {
@@ -961,41 +1005,40 @@ maybeAddRange(std::pair<unsigned, unsigned> A, std::pair<unsigned, unsigned> B,
 
 struct LineRange {
   unsigned LineNo;
-  unsigned StartCol;
-  unsigned EndCol;
+  Bytes StartByte;
+  Bytes EndByte;
 };
 
 /// Highlight \p R (with ~'s) on the current source line.
 static void highlightRange(const LineRange &R, const SourceColumnMap &Map,
                            std::string &CaretLine) {
   // Pick the first non-whitespace column.
-  unsigned StartColNo = R.StartCol;
-  while (StartColNo < Map.getSourceLine().size() &&
-         (Map.getSourceLine()[StartColNo] == ' ' ||
-          Map.getSourceLine()[StartColNo] == '\t'))
-    StartColNo = Map.startOfNextColumn(StartColNo);
+  Bytes StartByte = R.StartByte;
+  while (StartByte < Map.bytes() && (Map.getSourceLine()[StartByte.V] == ' ' ||
+                                     Map.getSourceLine()[StartByte.V] == '\t'))
+    StartByte = Map.startOfNextColumn(StartByte);
 
   // Pick the last non-whitespace column.
-  unsigned EndColNo =
-      std::min(static_cast<size_t>(R.EndCol), Map.getSourceLine().size());
-  while (EndColNo && (Map.getSourceLine()[EndColNo - 1] == ' ' ||
-                      Map.getSourceLine()[EndColNo - 1] == '\t'))
-    EndColNo = Map.startOfPreviousColumn(EndColNo);
+  Bytes EndByte = std::min(R.EndByte.V, Map.bytes().V);
+  while (EndByte.V != 0 && (Map.getSourceLine()[EndByte.V - 1] == ' ' ||
+                            Map.getSourceLine()[EndByte.V - 1] == '\t'))
+    EndByte = Map.startOfPreviousColumn(EndByte);
 
   // If the start/end passed each other, then we are trying to highlight a
   // range that just exists in whitespace. That most likely means we have
   // a multi-line highlighting range that covers a blank line.
-  if (StartColNo > EndColNo)
+  if (StartByte > EndByte)
     return;
 
+  assert(StartByte <= EndByte && "Invalid range!");
   // Fill the range with ~'s.
-  StartColNo = Map.byteToContainingColumn(StartColNo);
-  EndColNo = Map.byteToContainingColumn(EndColNo);
+  Columns StartCol = Map.byteToContainingColumn(StartByte);
+  Columns EndCol = Map.byteToContainingColumn(EndByte);
+
+  if (CaretLine.size() < static_cast<size_t>(EndCol.V))
+    CaretLine.resize(EndCol.V, ' ');
 
-  assert(StartColNo <= EndColNo && "Invalid range!");
-  if (CaretLine.size() < EndColNo)
-    CaretLine.resize(EndColNo, ' ');
-  std::fill(CaretLine.begin() + StartColNo, CaretLine.begin() + EndColNo, '~');
+  std::fill(CaretLine.begin() + StartCol.V, CaretLine.begin() + EndCol.V, '~');
 }
 
 static std::string buildFixItInsertionLine(FileID FID, unsigned LineNo,
@@ -1006,7 +1049,7 @@ static std::string buildFixItInsertionLine(FileID FID, unsigned LineNo,
   std::string FixItInsertionLine;
   if (Hints.empty() || !DiagOpts.ShowFixits)
     return FixItInsertionLine;
-  unsigned PrevHintEndCol = 0;
+  Columns PrevHintEndCol = 0;
 
   for (const auto &H : Hints) {
     if (H.CodeToInsert.empty())
@@ -1024,12 +1067,13 @@ static std::string buildFixItInsertionLine(FileID FID, unsigned LineNo,
       // Note: When modifying this function, be very careful about what is a
       // "column" (printed width, platform-dependent) and what is a
       // "byte offset" (SourceManager "column").
-      unsigned HintByteOffset =
-          SM.getColumnNumber(HintLocInfo.first, HintLocInfo.second) - 1;
+      Bytes HintByteOffset =
+          Bytes(SM.getColumnNumber(HintLocInfo.first, HintLocInfo.second))
+              .prev();
 
       // The hint must start inside the source or right at the end
-      assert(HintByteOffset < static_cast<unsigned>(map.bytes()) + 1);
-      unsigned HintCol = map.byteToContainingColumn(HintByteOffset);
+      assert(HintByteOffset < map.bytes().next());
+      Columns HintCol = map.byteToContainingColumn(HintByteOffset);
 
       // If we inserted a long previous hint, push this one forwards, and add
       // an extra space to show that this is not part of the previous
@@ -1043,11 +1087,11 @@ static std::string buildFixItInsertionLine(FileID FID, unsigned LineNo,
 
       // This should NOT use HintByteOffset, because the source might have
       // Unicode characters in earlier columns.
-      unsigned NewFixItLineSize = FixItInsertionLine.size() +
-                                  (HintCol - PrevHintEndCol) +
-                                  H.CodeToInsert.size();
+      Columns NewFixItLineSize = Columns(FixItInsertionLine.size()) +
+                                 (HintCol - PrevHintEndCol) +
+                                 Columns(H.CodeToInsert.size());
       if (NewFixItLineSize > FixItInsertionLine.size())
-        FixItInsertionLine.resize(NewFixItLineSize, ' ');
+        FixItInsertionLine.resize(NewFixItLineSize.V, ' ');
 
       std::copy(H.CodeToInsert.begin(), H.CodeToInsert.end(),
                 FixItInsertionLine.end() - H.CodeToInsert.size());
@@ -1095,28 +1139,29 @@ prepareAndFilterRanges(const SmallVectorImpl<CharSourceRange> &Ranges,
     if (EndLineNo < Lines.first || SM.getFileID(End) != FID)
       continue;
 
-    unsigned StartColumn = SM.getExpansionColumnNumber(Begin);
-    unsigned EndColumn = SM.getExpansionColumnNumber(End);
-    assert(StartColumn && "StartColumn must be valid, 0 is invalid");
-    assert(EndColumn && "EndColumn must be valid, 0 is invalid");
+    Bytes StartByte = SM.getExpansionColumnNumber(Begin);
+    Bytes EndByte = SM.getExpansionColumnNumber(End);
+    assert(StartByte.V != 0 && "StartByte must be valid, 0 is invalid");
+    assert(EndByte.V != 0 && "EndByte must be valid, 0 is invalid");
     if (R.isTokenRange())
-      EndColumn += Lexer::MeasureTokenLength(End, SM, LangOpts);
+      EndByte += Bytes(Lexer::MeasureTokenLength(End, SM, LangOpts));
 
     // Only a single line.
     if (StartLineNo == EndLineNo) {
-      LineRanges.push_back({StartLineNo, StartColumn - 1, EndColumn - 1});
+      LineRanges.push_back({StartLineNo, StartByte.prev(), EndByte.prev()});
       continue;
     }
 
     // Start line.
-    LineRanges.push_back({StartLineNo, StartColumn - 1, ~0u});
+    LineRanges.push_back(
+        {StartLineNo, StartByte.prev(), std::numeric_limits<int>::max()});
 
     // Middle lines.
     for (unsigned S = StartLineNo + 1; S != EndLineNo; ++S)
-      LineRanges.push_back({S, 0, ~0u});
+      LineRanges.push_back({S, 0, std::numeric_limits<int>::max()});
 
     // End line.
-    LineRanges.push_back({EndLineNo, 0, EndColumn - 1});
+    LineRanges.push_back({EndLineNo, 0, EndByte.prev()});
   }
 
   return LineRanges;
@@ -1226,8 +1271,7 @@ highlightLines(StringRef FileData, unsigned StartLineNumber,
     if (TokenStartLine > EndLineNumber)
       break;
 
-    unsigned StartCol =
-        SM.getSpellingColumnNumber(T.getLocation(), &Invalid) - 1;
+    Bytes StartCol = SM.getSpellingColumnNumber(T.getLocation(), &Invalid) - 1;
     if (Invalid)
       continue;
 
@@ -1235,14 +1279,14 @@ highlightLines(StringRef FileData, unsigned StartLineNumber,
     if (TokenStartLine == TokenEndLine) {
       SmallVector<TextDiagnostic::StyleRange> &LineRanges =
           SnippetRanges[TokenStartLine - StartLineNumber];
-      appendStyle(LineRanges, T, StartCol, T.getLength());
+      appendStyle(LineRanges, T, StartCol.V, T.getLength());
       continue;
     }
     assert((TokenEndLine - TokenStartLine) >= 1);
 
     // For tokens that span multiple lines (think multiline comments), we
     // divide them into multiple StyleRanges.
-    unsigned EndCol = SM.getSpellingColumnNumber(T.getEndLoc(), &Invalid) - 1;
+    Bytes EndCol = SM.getSpellingColumnNumber(T.getEndLoc(), &Invalid) - 1;
     if (Invalid)
       continue;
 
@@ -1258,9 +1302,9 @@ highlightLines(StringRef FileData, unsigned StartLineNumber,
               SnippetRanges[L - StartLineNumber];
 
           if (L == TokenStartLine) // First line
-            appendStyle(LineRanges, T, StartCol, LineLength);
+            appendStyle(LineRanges, T, StartCol.V, LineLength);
           else if (L == TokenEndLine) // Last line
-            appendStyle(LineRanges, T, 0, EndCol);
+            appendStyle(LineRanges, T, 0, EndCol.V);
           else
             appendStyle(LineRanges, T, 0, LineLength);
         }
@@ -1315,11 +1359,11 @@ void TextDiagnostic::emitSnippetAndCaret(
   const char *BufEnd = BufStart + BufData.size();
 
   unsigned CaretLineNo = Loc.getLineNumber();
-  unsigned CaretColNo = Loc.getColumnNumber();
+  Bytes CaretByte = Loc.getColumnNumber();
 
   // Arbitrarily stop showing snippets when the line is too long.
   static const size_t MaxLineLengthToPrint = 4096;
-  if (CaretColNo > MaxLineLengthToPrint)
+  if (CaretByte > MaxLineLengthToPrint)
     return;
 
   // Find the set of lines to include.
@@ -1379,35 +1423,37 @@ void TextDiagnostic::emitSnippetAndCaret(
     std::string SourceLine(LineStart, LineEnd);
     // Remove trailing null bytes.
     while (!SourceLine.empty() && SourceLine.back() == '\0' &&
-           (LineNo != CaretLineNo || SourceLine.size() > CaretColNo))
+           (LineNo != CaretLineNo ||
+            SourceLine.size() > static_cast<size_t>(CaretByte.V)))
       SourceLine.pop_back();
 
     // Build the byte to column map.
-    const SourceColumnMap sourceColMap(SourceLine, DiagOpts.TabStop);
+    const SourceColumnMap SourceColMap(SourceLine, DiagOpts.TabStop);
 
     std::string CaretLine;
     // Highlight all of the characters covered by Ranges with ~ characters.
     for (const auto &LR : LineRanges) {
       if (LR.LineNo == LineNo)
-        highlightRange(LR, sourceColMap, CaretLine);
+        highlightRange(LR, SourceColMap, CaretLine);
     }
 
     // Next, insert the caret itself.
     if (CaretLineNo == LineNo) {
-      size_t Col = sourceColMap.byteToContainingColumn(CaretColNo - 1);
-      CaretLine.resize(std::max(Col + 1, CaretLine.size()), ' ');
-      CaretLine[Col] = '^';
+      Columns Col = SourceColMap.byteToContainingColumn(CaretByte.prev());
+      CaretLine.resize(
+          std::max(static_cast<size_t>(Col.V) + 1, CaretLine.size()), ' ');
+      CaretLine[Col.V] = '^';
     }
 
     std::string FixItInsertionLine =
-        buildFixItInsertionLine(FID, LineNo, sourceColMap, Hints, SM, DiagOpts);
+        buildFixItInsertionLine(FID, LineNo, SourceColMap, Hints, SM, DiagOpts);
 
     // If the source line is too long for our terminal, select only the
     // "interesting" source region within that line.
-    unsigned Columns = DiagOpts.MessageLength;
-    if (Columns)
+    Columns MessageLength = DiagOpts.MessageLength;
+    if (MessageLength.V != 0)
       selectInterestingSourceRegion(SourceLine, CaretLine, FixItInsertionLine,
-                                    Columns, sourceColMap);
+                                    MessageLength, SourceColMap);
 
     // If we are in -fdiagnostics-print-source-range-info mode, we are trying
     // to produce easily machine parsable output.  Add a space before the
@@ -1425,7 +1471,7 @@ void TextDiagnostic::emitSnippetAndCaret(
     if (!CaretLine.empty()) {
       indentForLineNumbers();
       if (DiagOpts.ShowColors)
-        OS.changeColor(caretColor, true);
+        OS.changeColor(CaretColor, true);
       OS << CaretLine << '\n';
       if (DiagOpts.ShowColors)
         OS.resetColor();
@@ -1435,7 +1481,7 @@ void TextDiagnostic::emitSnippetAndCaret(
       indentForLineNumbers();
       if (DiagOpts.ShowColors)
         // Print fixit line in color
-        OS.changeColor(fixitColor, false);
+        OS.changeColor(FixitColor, false);
       if (DiagOpts.ShowSourceRanges)
         OS << ' ';
       OS << FixItInsertionLine << '\n';
diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index 25199c739ace9..31bc941e6a015 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -3221,6 +3221,7 @@ OMPClause *Parser::ParseOpenMPClause(OpenMPDirectiveKind DKind,
     else
       Clause = ParseOpenMPSingleExprClause(CKind, WrongDirective);
     break;
+  case OMPC_threadset:
   case OMPC_fail:
   case OMPC_proc_bind:
   case OMPC_atomic_default_mem_order:
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index 6f2ab74d8f0eb..983fdc4121325 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -17659,6 +17659,10 @@ OMPClause *SemaOpenMP::ActOnOpenMPSimpleClause(
         static_cast<OpenMPSeverityClauseKind>(Argument), ArgumentLoc, StartLoc,
         LParenLoc, EndLoc);
     break;
+  case OMPC_threadset:
+    Res = ActOnOpenMPThreadsetClause(static_cast<OpenMPThreadsetKind>(Argument),
+                                     ArgumentLoc, StartLoc, LParenLoc, EndLoc);
+    break;
   case OMPC_if:
   case OMPC_final:
   case OMPC_num_threads:
@@ -17798,6 +17802,23 @@ OMPClause *SemaOpenMP::ActOnOpenMPDefaultClause(
       OMPDefaultClause(M, MLoc, VCKind, VCKindLoc, StartLoc, LParenLoc, EndLoc);
 }
 
+OMPClause *SemaOpenMP::ActOnOpenMPThreadsetClause(OpenMPThreadsetKind Kind,
+                                                  SourceLocation KindLoc,
+                                                  SourceLocation StartLoc,
+                                                  SourceLocation LParenLoc,
+                                                  SourceLocation EndLoc) {
+  if (Kind == OMPC_THREADSET_unknown) {
+    Diag(KindLoc, diag::err_omp_unexpected_clause_value)
+        << getListOfPossibleValues(OMPC_threadset, /*First=*/0,
+                                   /*Last=*/unsigned(OMPC_THREADSET_unknown))
+        << getOpenMPClauseName(OMPC_threadset);
+    return nullptr;
+  }
+
+  return new (getASTContext())
+      OMPThreadsetClause(Kind, KindLoc, StartLoc, LParenLoc, EndLoc);
+}
+
 OMPClause *SemaOpenMP::ActOnOpenMPProcBindClause(ProcBindKind Kind,
                                                  SourceLocation KindKwLoc,
                                                  SourceLocation StartLoc,
diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index 0c8c1d18d317e..8c20078e97a13 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -10622,6 +10622,13 @@ TreeTransform<Derived>::TransformOMPDefaultClause(OMPDefaultClause *C) {
       C->getEndLoc());
 }
 
+template <typename Derived>
+OMPClause *
+TreeTransform<Derived>::TransformOMPThreadsetClause(OMPThreadsetClause *C) {
+  // No need to rebuild this clause, no template-dependent parameters.
+  return C;
+}
+
 template <typename Derived>
 OMPClause *
 TreeTransform<Derived>::TransformOMPProcBindClause(OMPProcBindClause *C) {
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index c1b5cb730e4a4..e3106f8d8e13c 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -11255,6 +11255,9 @@ OMPClause *OMPClauseReader::readClause() {
   case llvm::omp::OMPC_mergeable:
     C = new (Context) OMPMergeableClause();
     break;
+  case llvm::omp::OMPC_threadset:
+    C = new (Context) OMPThreadsetClause();
+    break;
   case llvm::omp::OMPC_read:
     C = new (Context) OMPReadClause();
     break;
@@ -11658,6 +11661,17 @@ void OMPClauseReader::VisitOMPDefaultClause(OMPDefaultClause *C) {
   C->setDefaultVariableCategoryLocation(Record.readSourceLocation());
 }
 
+// Read the parameter of threadset clause. This will have been saved when
+// OMPClauseWriter is called.
+void OMPClauseReader::VisitOMPThreadsetClause(OMPThreadsetClause *C) {
+  C->setLParenLoc(Record.readSourceLocation());
+  SourceLocation ThreadsetKindLoc = Record.readSourceLocation();
+  C->setThreadsetKindLoc(ThreadsetKindLoc);
+  OpenMPThreadsetKind TKind =
+      static_cast<OpenMPThreadsetKind>(Record.readInt());
+  C->setThreadsetKind(TKind);
+}
+
 void OMPClauseReader::VisitOMPProcBindClause(OMPProcBindClause *C) {
   C->setProcBindKind(static_cast<llvm::omp::ProcBindKind>(Record.readInt()));
   C->setLParenLoc(Record.readSourceLocation());
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 377e3966874f3..3ac338e013deb 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -7913,6 +7913,12 @@ void OMPClauseWriter::VisitOMPDefaultClause(OMPDefaultClause *C) {
   Record.AddSourceLocation(C->getDefaultVCLoc());
 }
 
+void OMPClauseWriter::VisitOMPThreadsetClause(OMPThreadsetClause *C) {
+  Record.AddSourceLocation(C->getLParenLoc());
+  Record.AddSourceLocation(C->getThreadsetKindLoc());
+  Record.writeEnum(C->getThreadsetKind());
+}
+
 void OMPClauseWriter::VisitOMPProcBindClause(OMPProcBindClause *C) {
   Record.push_back(unsigned(C->getProcBindKind()));
   Record.AddSourceLocation(C->getLParenLoc());
diff --git a/clang/test/CodeGen/AArch64/neon-across.c b/clang/test/CodeGen/AArch64/neon-across.c
index aa0387d89dfef..aae5097da7789 100644
--- a/clang/test/CodeGen/AArch64/neon-across.c
+++ b/clang/test/CodeGen/AArch64/neon-across.c
@@ -113,9 +113,8 @@ uint64_t test_vaddlvq_u32(uint32x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxv_s8
 // CHECK-SAME: (<8 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v8i8(<8 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMAXV_S8_I:%.*]] = call i8 @llvm.vector.reduce.smax.v8i8(<8 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMAXV_S8_I]]
 //
 int8_t test_vmaxv_s8(int8x8_t a) {
   return vmaxv_s8(a);
@@ -124,9 +123,8 @@ int8_t test_vmaxv_s8(int8x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxv_s16
 // CHECK-SAME: (<4 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v4i16(<4 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMAXV_S16_I:%.*]] = call i16 @llvm.vector.reduce.smax.v4i16(<4 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMAXV_S16_I]]
 //
 int16_t test_vmaxv_s16(int16x4_t a) {
   return vmaxv_s16(a);
@@ -135,9 +133,8 @@ int16_t test_vmaxv_s16(int16x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxv_u8
 // CHECK-SAME: (<8 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v8i8(<8 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMAXV_U8_I:%.*]] = call i8 @llvm.vector.reduce.umax.v8i8(<8 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMAXV_U8_I]]
 //
 uint8_t test_vmaxv_u8(uint8x8_t a) {
   return vmaxv_u8(a);
@@ -146,9 +143,8 @@ uint8_t test_vmaxv_u8(uint8x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxv_u16
 // CHECK-SAME: (<4 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v4i16(<4 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMAXV_U16_I:%.*]] = call i16 @llvm.vector.reduce.umax.v4i16(<4 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMAXV_U16_I]]
 //
 uint16_t test_vmaxv_u16(uint16x4_t a) {
   return vmaxv_u16(a);
@@ -157,9 +153,8 @@ uint16_t test_vmaxv_u16(uint16x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxvq_s8
 // CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v16i8(<16 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMAXVQ_S8_I:%.*]] = call i8 @llvm.vector.reduce.smax.v16i8(<16 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMAXVQ_S8_I]]
 //
 int8_t test_vmaxvq_s8(int8x16_t a) {
   return vmaxvq_s8(a);
@@ -168,9 +163,8 @@ int8_t test_vmaxvq_s8(int8x16_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxvq_s16
 // CHECK-SAME: (<8 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v8i16(<8 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMAXVQ_S16_I:%.*]] = call i16 @llvm.vector.reduce.smax.v8i16(<8 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMAXVQ_S16_I]]
 //
 int16_t test_vmaxvq_s16(int16x8_t a) {
   return vmaxvq_s16(a);
@@ -179,7 +173,7 @@ int16_t test_vmaxvq_s16(int16x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxvq_s32
 // CHECK-SAME: (<4 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v4i32(<4 x i32> [[A]])
+// CHECK-NEXT:    [[VMAXVQ_S32_I:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMAXVQ_S32_I]]
 //
 int32_t test_vmaxvq_s32(int32x4_t a) {
@@ -189,9 +183,8 @@ int32_t test_vmaxvq_s32(int32x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxvq_u8
 // CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v16i8(<16 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMAXVQ_U8_I:%.*]] = call i8 @llvm.vector.reduce.umax.v16i8(<16 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMAXVQ_U8_I]]
 //
 uint8_t test_vmaxvq_u8(uint8x16_t a) {
   return vmaxvq_u8(a);
@@ -200,9 +193,8 @@ uint8_t test_vmaxvq_u8(uint8x16_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxvq_u16
 // CHECK-SAME: (<8 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXV_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v8i16(<8 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMAXV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMAXVQ_U16_I:%.*]] = call i16 @llvm.vector.reduce.umax.v8i16(<8 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMAXVQ_U16_I]]
 //
 uint16_t test_vmaxvq_u16(uint16x8_t a) {
   return vmaxvq_u16(a);
@@ -211,7 +203,7 @@ uint16_t test_vmaxvq_u16(uint16x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vmaxvq_u32
 // CHECK-SAME: (<4 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMAXVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v4i32(<4 x i32> [[A]])
+// CHECK-NEXT:    [[VMAXVQ_U32_I:%.*]] = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMAXVQ_U32_I]]
 //
 uint32_t test_vmaxvq_u32(uint32x4_t a) {
@@ -221,9 +213,8 @@ uint32_t test_vmaxvq_u32(uint32x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminv_s8
 // CHECK-SAME: (<8 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v8i8(<8 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMINV_S8_I:%.*]] = call i8 @llvm.vector.reduce.smin.v8i8(<8 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMINV_S8_I]]
 //
 int8_t test_vminv_s8(int8x8_t a) {
   return vminv_s8(a);
@@ -232,9 +223,8 @@ int8_t test_vminv_s8(int8x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminv_s16
 // CHECK-SAME: (<4 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v4i16(<4 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMINV_S16_I:%.*]] = call i16 @llvm.vector.reduce.smin.v4i16(<4 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMINV_S16_I]]
 //
 int16_t test_vminv_s16(int16x4_t a) {
   return vminv_s16(a);
@@ -243,9 +233,8 @@ int16_t test_vminv_s16(int16x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminv_u8
 // CHECK-SAME: (<8 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v8i8(<8 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMINV_U8_I:%.*]] = call i8 @llvm.vector.reduce.umin.v8i8(<8 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMINV_U8_I]]
 //
 uint8_t test_vminv_u8(uint8x8_t a) {
   return vminv_u8(a);
@@ -254,9 +243,8 @@ uint8_t test_vminv_u8(uint8x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminv_u16
 // CHECK-SAME: (<4 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v4i16(<4 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMINV_U16_I:%.*]] = call i16 @llvm.vector.reduce.umin.v4i16(<4 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMINV_U16_I]]
 //
 uint16_t test_vminv_u16(uint16x4_t a) {
   return vminv_u16(a);
@@ -265,9 +253,8 @@ uint16_t test_vminv_u16(uint16x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminvq_s8
 // CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v16i8(<16 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMINVQ_S8_I:%.*]] = call i8 @llvm.vector.reduce.smin.v16i8(<16 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMINVQ_S8_I]]
 //
 int8_t test_vminvq_s8(int8x16_t a) {
   return vminvq_s8(a);
@@ -276,9 +263,8 @@ int8_t test_vminvq_s8(int8x16_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminvq_s16
 // CHECK-SAME: (<8 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v8i16(<8 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMINVQ_S16_I:%.*]] = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMINVQ_S16_I]]
 //
 int16_t test_vminvq_s16(int16x8_t a) {
   return vminvq_s16(a);
@@ -287,7 +273,7 @@ int16_t test_vminvq_s16(int16x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminvq_s32
 // CHECK-SAME: (<4 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v4i32(<4 x i32> [[A]])
+// CHECK-NEXT:    [[VMINVQ_S32_I:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMINVQ_S32_I]]
 //
 int32_t test_vminvq_s32(int32x4_t a) {
@@ -297,9 +283,8 @@ int32_t test_vminvq_s32(int32x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminvq_u8
 // CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v16i8(<16 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VMINVQ_U8_I:%.*]] = call i8 @llvm.vector.reduce.umin.v16i8(<16 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VMINVQ_U8_I]]
 //
 uint8_t test_vminvq_u8(uint8x16_t a) {
   return vminvq_u8(a);
@@ -308,9 +293,8 @@ uint8_t test_vminvq_u8(uint8x16_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminvq_u16
 // CHECK-SAME: (<8 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINV_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v8i16(<8 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VMINV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VMINVQ_U16_I:%.*]] = call i16 @llvm.vector.reduce.umin.v8i16(<8 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VMINVQ_U16_I]]
 //
 uint16_t test_vminvq_u16(uint16x8_t a) {
   return vminvq_u16(a);
@@ -319,7 +303,7 @@ uint16_t test_vminvq_u16(uint16x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vminvq_u32
 // CHECK-SAME: (<4 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VMINVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v4i32(<4 x i32> [[A]])
+// CHECK-NEXT:    [[VMINVQ_U32_I:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMINVQ_U32_I]]
 //
 uint32_t test_vminvq_u32(uint32x4_t a) {
@@ -329,9 +313,8 @@ uint32_t test_vminvq_u32(uint32x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddv_s8
 // CHECK-SAME: (<8 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v8i8(<8 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VADDV_S8_I:%.*]] = call i8 @llvm.vector.reduce.add.v8i8(<8 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VADDV_S8_I]]
 //
 int8_t test_vaddv_s8(int8x8_t a) {
   return vaddv_s8(a);
@@ -340,9 +323,8 @@ int8_t test_vaddv_s8(int8x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddv_s16
 // CHECK-SAME: (<4 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v4i16(<4 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VADDV_S16_I:%.*]] = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VADDV_S16_I]]
 //
 int16_t test_vaddv_s16(int16x4_t a) {
   return vaddv_s16(a);
@@ -351,9 +333,8 @@ int16_t test_vaddv_s16(int16x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddv_u8
 // CHECK-SAME: (<8 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v8i8(<8 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VADDV_U8_I:%.*]] = call i8 @llvm.vector.reduce.add.v8i8(<8 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VADDV_U8_I]]
 //
 uint8_t test_vaddv_u8(uint8x8_t a) {
   return vaddv_u8(a);
@@ -362,9 +343,8 @@ uint8_t test_vaddv_u8(uint8x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddv_u16
 // CHECK-SAME: (<4 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v4i16(<4 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VADDV_U16_I:%.*]] = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VADDV_U16_I]]
 //
 uint16_t test_vaddv_u16(uint16x4_t a) {
   return vaddv_u16(a);
@@ -373,9 +353,8 @@ uint16_t test_vaddv_u16(uint16x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddvq_s8
 // CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v16i8(<16 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VADDVQ_S8_I:%.*]] = call i8 @llvm.vector.reduce.add.v16i8(<16 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VADDVQ_S8_I]]
 //
 int8_t test_vaddvq_s8(int8x16_t a) {
   return vaddvq_s8(a);
@@ -384,9 +363,8 @@ int8_t test_vaddvq_s8(int8x16_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddvq_s16
 // CHECK-SAME: (<8 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v8i16(<8 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VADDVQ_S16_I:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VADDVQ_S16_I]]
 //
 int16_t test_vaddvq_s16(int16x8_t a) {
   return vaddvq_s16(a);
@@ -395,7 +373,7 @@ int16_t test_vaddvq_s16(int16x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddvq_s32
 // CHECK-SAME: (<4 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDVQ_S32_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v4i32(<4 x i32> [[A]])
+// CHECK-NEXT:    [[VADDVQ_S32_I:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VADDVQ_S32_I]]
 //
 int32_t test_vaddvq_s32(int32x4_t a) {
@@ -405,9 +383,8 @@ int32_t test_vaddvq_s32(int32x4_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddvq_u8
 // CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i8
-// CHECK-NEXT:    ret i8 [[TMP0]]
+// CHECK-NEXT:    [[VADDVQ_U8_I:%.*]] = call i8 @llvm.vector.reduce.add.v16i8(<16 x i8> [[A]])
+// CHECK-NEXT:    ret i8 [[VADDVQ_U8_I]]
 //
 uint8_t test_vaddvq_u8(uint8x16_t a) {
   return vaddvq_u8(a);
@@ -416,9 +393,8 @@ uint8_t test_vaddvq_u8(uint8x16_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddvq_u16
 // CHECK-SAME: (<8 x i16> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDV_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v8i16(<8 x i16> [[A]])
-// CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[VADDV_I]] to i16
-// CHECK-NEXT:    ret i16 [[TMP0]]
+// CHECK-NEXT:    [[VADDVQ_U16_I:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[A]])
+// CHECK-NEXT:    ret i16 [[VADDVQ_U16_I]]
 //
 uint16_t test_vaddvq_u16(uint16x8_t a) {
   return vaddvq_u16(a);
@@ -427,7 +403,7 @@ uint16_t test_vaddvq_u16(uint16x8_t a) {
 // CHECK-LABEL: define {{[^@]+}}@test_vaddvq_u32
 // CHECK-SAME: (<4 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VADDVQ_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v4i32(<4 x i32> [[A]])
+// CHECK-NEXT:    [[VADDVQ_U32_I:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VADDVQ_U32_I]]
 //
 uint32_t test_vaddvq_u32(uint32x4_t a) {
diff --git a/clang/test/CodeGen/AArch64/neon-intrinsics.c b/clang/test/CodeGen/AArch64/neon-intrinsics.c
index 035e1ca1b45e8..1c628bbba483f 100644
--- a/clang/test/CodeGen/AArch64/neon-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/neon-intrinsics.c
@@ -12643,7 +12643,7 @@ uint64_t test_vqrshld_u64(uint64_t a, int64_t b) {
 // CHECK-LABEL: define dso_local i64 @test_vpaddd_s64(
 // CHECK-SAME: <2 x i64> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VPADDD_S64_I:%.*]] = call i64 @llvm.aarch64.neon.uaddv.i64.v2i64(<2 x i64> [[A]])
+// CHECK-NEXT:    [[VPADDD_S64_I:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[A]])
 // CHECK-NEXT:    ret i64 [[VPADDD_S64_I]]
 //
 int64_t test_vpaddd_s64(int64x2_t a) {
@@ -23227,7 +23227,7 @@ uint64x2_t test_vpaddq_u64(uint64x2_t a, uint64x2_t b) {
 // CHECK-LABEL: define dso_local i64 @test_vpaddd_u64(
 // CHECK-SAME: <2 x i64> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VPADDD_U64_I:%.*]] = call i64 @llvm.aarch64.neon.uaddv.i64.v2i64(<2 x i64> [[A]])
+// CHECK-NEXT:    [[VPADDD_U64_I:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[A]])
 // CHECK-NEXT:    ret i64 [[VPADDD_U64_I]]
 //
 uint64_t test_vpaddd_u64(uint64x2_t a) {
@@ -23237,7 +23237,7 @@ uint64_t test_vpaddd_u64(uint64x2_t a) {
 // CHECK-LABEL: define dso_local i64 @test_vaddvq_s64(
 // CHECK-SAME: <2 x i64> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VADDVQ_S64_I:%.*]] = call i64 @llvm.aarch64.neon.saddv.i64.v2i64(<2 x i64> [[A]])
+// CHECK-NEXT:    [[VADDVQ_S64_I:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[A]])
 // CHECK-NEXT:    ret i64 [[VADDVQ_S64_I]]
 //
 int64_t test_vaddvq_s64(int64x2_t a) {
@@ -23247,7 +23247,7 @@ int64_t test_vaddvq_s64(int64x2_t a) {
 // CHECK-LABEL: define dso_local i64 @test_vaddvq_u64(
 // CHECK-SAME: <2 x i64> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VADDVQ_U64_I:%.*]] = call i64 @llvm.aarch64.neon.uaddv.i64.v2i64(<2 x i64> [[A]])
+// CHECK-NEXT:    [[VADDVQ_U64_I:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[A]])
 // CHECK-NEXT:    ret i64 [[VADDVQ_U64_I]]
 //
 uint64_t test_vaddvq_u64(uint64x2_t a) {
@@ -23878,7 +23878,7 @@ float64x1_t test_vrsqrts_f64(float64x1_t a, float64x1_t b) {
 // CHECK-LABEL: define dso_local i32 @test_vminv_s32(
 // CHECK-SAME: <2 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VMINV_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sminv.i32.v2i32(<2 x i32> [[A]])
+// CHECK-NEXT:    [[VMINV_S32_I:%.*]] = call i32 @llvm.vector.reduce.smin.v2i32(<2 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMINV_S32_I]]
 //
 int32_t test_vminv_s32(int32x2_t a) {
@@ -23888,7 +23888,7 @@ int32_t test_vminv_s32(int32x2_t a) {
 // CHECK-LABEL: define dso_local i32 @test_vminv_u32(
 // CHECK-SAME: <2 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VMINV_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uminv.i32.v2i32(<2 x i32> [[A]])
+// CHECK-NEXT:    [[VMINV_U32_I:%.*]] = call i32 @llvm.vector.reduce.umin.v2i32(<2 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMINV_U32_I]]
 //
 uint32_t test_vminv_u32(uint32x2_t a) {
@@ -23898,7 +23898,7 @@ uint32_t test_vminv_u32(uint32x2_t a) {
 // CHECK-LABEL: define dso_local i32 @test_vmaxv_s32(
 // CHECK-SAME: <2 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VMAXV_S32_I:%.*]] = call i32 @llvm.aarch64.neon.smaxv.i32.v2i32(<2 x i32> [[A]])
+// CHECK-NEXT:    [[VMAXV_S32_I:%.*]] = call i32 @llvm.vector.reduce.smax.v2i32(<2 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMAXV_S32_I]]
 //
 int32_t test_vmaxv_s32(int32x2_t a) {
@@ -23908,7 +23908,7 @@ int32_t test_vmaxv_s32(int32x2_t a) {
 // CHECK-LABEL: define dso_local i32 @test_vmaxv_u32(
 // CHECK-SAME: <2 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VMAXV_U32_I:%.*]] = call i32 @llvm.aarch64.neon.umaxv.i32.v2i32(<2 x i32> [[A]])
+// CHECK-NEXT:    [[VMAXV_U32_I:%.*]] = call i32 @llvm.vector.reduce.umax.v2i32(<2 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VMAXV_U32_I]]
 //
 uint32_t test_vmaxv_u32(uint32x2_t a) {
@@ -23918,7 +23918,7 @@ uint32_t test_vmaxv_u32(uint32x2_t a) {
 // CHECK-LABEL: define dso_local i32 @test_vaddv_s32(
 // CHECK-SAME: <2 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VADDV_S32_I:%.*]] = call i32 @llvm.aarch64.neon.saddv.i32.v2i32(<2 x i32> [[A]])
+// CHECK-NEXT:    [[VADDV_S32_I:%.*]] = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VADDV_S32_I]]
 //
 int32_t test_vaddv_s32(int32x2_t a) {
@@ -23928,7 +23928,7 @@ int32_t test_vaddv_s32(int32x2_t a) {
 // CHECK-LABEL: define dso_local i32 @test_vaddv_u32(
 // CHECK-SAME: <2 x i32> noundef [[A:%.*]]) #[[ATTR0]] {
 // CHECK-NEXT:  [[ENTRY:.*:]]
-// CHECK-NEXT:    [[VADDV_U32_I:%.*]] = call i32 @llvm.aarch64.neon.uaddv.i32.v2i32(<2 x i32> [[A]])
+// CHECK-NEXT:    [[VADDV_U32_I:%.*]] = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> [[A]])
 // CHECK-NEXT:    ret i32 [[VADDV_U32_I]]
 //
 uint32_t test_vaddv_u32(uint32x2_t a) {
diff --git a/clang/test/DebugInfo/Generic/bit-int.c b/clang/test/DebugInfo/Generic/bit-int.c
index 94b93013e3b46..88ecc139eee9f 100644
--- a/clang/test/DebugInfo/Generic/bit-int.c
+++ b/clang/test/DebugInfo/Generic/bit-int.c
@@ -4,5 +4,5 @@
 unsigned _BitInt(17) a;
 _BitInt(2) b;
 
-// CHECK: !DIBasicType(name: "_BitInt", size: 8, dataSize: 2, encoding: DW_ATE_signed)
-// CHECK: !DIBasicType(name: "unsigned _BitInt", size: 32,  dataSize: 17, encoding: DW_ATE_unsigned)
+// CHECK: !DIBasicType(name: "_BitInt(2)", size: 8, dataSize: 2, encoding: DW_ATE_signed)
+// CHECK: !DIBasicType(name: "unsigned _BitInt(17)", size: 32,  dataSize: 17, encoding: DW_ATE_unsigned)
diff --git a/clang/test/OpenMP/task_ast_print.cpp b/clang/test/OpenMP/task_ast_print.cpp
index 30fb7ab75cc87..b059f187156ee 100644
--- a/clang/test/OpenMP/task_ast_print.cpp
+++ b/clang/test/OpenMP/task_ast_print.cpp
@@ -1,8 +1,10 @@
 // RUN: %clang_cc1 -verify -Wno-vla -fopenmp -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -verify -Wno-vla -fopenmp -fopenmp-version=60 -DOMP60 -ast-print %s | FileCheck %s --check-prefix=CHECK60
 // RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp -std=c++11 -include-pch %t -verify -Wno-vla %s -ast-print | FileCheck %s
 
 // RUN: %clang_cc1 -verify -Wno-vla -fopenmp-simd -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -verify -Wno-vla -fopenmp-simd -fopenmp-version=60 -DOMP60 -ast-print %s | FileCheck %s --check-prefix=CHECK60
 // RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -std=c++11 -include-pch %t -verify -Wno-vla %s -ast-print | FileCheck %s
 // RUN: %clang_cc1 -triple x86_64-pc-linux-gnu -fopenmp -ast-dump  %s | FileCheck %s --check-prefix=DUMP
@@ -101,8 +103,8 @@ T tmain(T argc, T *argv) {
   a = 2;
 #pragma omp task default(none), private(argc, b) firstprivate(argv) shared(d) if (argc > 0) final(S<T>::TS > 0) priority(argc) affinity(argc, argv[b:argc], arr[:], ([argc][sizeof(T)])argv)
   foo();
-#pragma omp taskgroup task_reduction(-: argc)
-#pragma omp task if (C) mergeable priority(C) in_reduction(-: argc)
+#pragma omp taskgroup task_reduction(+: argc)
+#pragma omp task if (C) mergeable priority(C) in_reduction(+: argc)
   foo();
   return 0;
 }
@@ -119,8 +121,8 @@ T tmain(T argc, T *argv) {
 // CHECK-NEXT: a = 2;
 // CHECK-NEXT: #pragma omp task default(none) private(argc,b) firstprivate(argv) shared(d) if(argc > 0) final(S<T>::TS > 0) priority(argc) affinity(argc,argv[b:argc],arr[:],([argc][sizeof(T)])argv)
 // CHECK-NEXT: foo()
-// CHECK-NEXT: #pragma omp taskgroup task_reduction(-: argc)
-// CHECK-NEXT: #pragma omp task if(C) mergeable priority(C) in_reduction(-: argc)
+// CHECK-NEXT: #pragma omp taskgroup task_reduction(+: argc)
+// CHECK-NEXT: #pragma omp task if(C) mergeable priority(C) in_reduction(+: argc)
 // CHECK-NEXT: foo()
 // CHECK: template<> int tmain<int, 5>(int argc, int *argv) {
 // CHECK-NEXT: int b = argc, c, d, e, f, g;
@@ -134,8 +136,8 @@ T tmain(T argc, T *argv) {
 // CHECK-NEXT: a = 2;
 // CHECK-NEXT: #pragma omp task default(none) private(argc,b) firstprivate(argv) shared(d) if(argc > 0) final(S<int>::TS > 0) priority(argc) affinity(argc,argv[b:argc],arr[:],([argc][sizeof(int)])argv)
 // CHECK-NEXT: foo()
-// CHECK-NEXT: #pragma omp taskgroup task_reduction(-: argc)
-// CHECK-NEXT: #pragma omp task if(5) mergeable priority(5) in_reduction(-: argc)
+// CHECK-NEXT: #pragma omp taskgroup task_reduction(+: argc)
+// CHECK-NEXT: #pragma omp task if(5) mergeable priority(5) in_reduction(+: argc)
 // CHECK-NEXT: foo()
 // CHECK: template<> long tmain<long, 1>(long argc, long *argv) {
 // CHECK-NEXT: long b = argc, c, d, e, f, g;
@@ -149,8 +151,8 @@ T tmain(T argc, T *argv) {
 // CHECK-NEXT: a = 2;
 // CHECK-NEXT: #pragma omp task default(none) private(argc,b) firstprivate(argv) shared(d) if(argc > 0) final(S<long>::TS > 0) priority(argc) affinity(argc,argv[b:argc],arr[:],([argc][sizeof(long)])argv)
 // CHECK-NEXT: foo()
-// CHECK-NEXT: #pragma omp taskgroup task_reduction(-: argc)
-// CHECK-NEXT: #pragma omp task if(1) mergeable priority(1) in_reduction(-: argc)
+// CHECK-NEXT: #pragma omp taskgroup task_reduction(+: argc)
+// CHECK-NEXT: #pragma omp task if(1) mergeable priority(1) in_reduction(+: argc)
 // CHECK-NEXT: foo()
 
 enum Enum {};
@@ -199,6 +201,14 @@ int main(int argc, char **argv) {
 #pragma omp task depend(inout: omp_all_memory)
   foo();
   // CHECK-NEXT: foo();
+#ifdef OMP60
+#pragma omp task threadset(omp_pool)
+#pragma omp task threadset(omp_team)
+  foo();
+#endif
+  // CHECK60: #pragma omp task threadset(omp_pool)
+  // CHECK60: #pragma omp task threadset(omp_team)
+  // CHECK60-NEXT: foo();
   return tmain<int, 5>(b, &b) + tmain<long, 1>(x, &x);
 }
 
diff --git a/clang/test/OpenMP/task_codegen.cpp b/clang/test/OpenMP/task_codegen.cpp
index c3e6d9e6b1cf7..ba8e6945de9d0 100644
--- a/clang/test/OpenMP/task_codegen.cpp
+++ b/clang/test/OpenMP/task_codegen.cpp
@@ -41,6 +41,9 @@
 // RUN: -emit-llvm -o - -DOMP51 | FileCheck %s \
 // RUN: --implicit-check-not="{{__kmpc|__tgt}}"
 
+// RUN: %clang_cc1 -verify -Wno-vla -triple x86_64-apple-darwin10 -fopenmp -fopenmp-version=60 -DOMP60 -fopenmp-enable-irbuilder -x c++ -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK6
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -DOMP60 -fopenmp-enable-irbuilder -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -DOMP60 -fopenmp-enable-irbuilder -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify -Wno-vla %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK6
 
 // expected-no-diagnostics
 #ifndef HEADER
@@ -65,6 +68,7 @@ struct S {
   S(const S &s) : a(s.a) {}
   ~S() {}
 };
+
 int a;
 int main() {
   char b;
@@ -147,6 +151,7 @@ int main() {
 
 
 
+
 // s1 = S();
 
 
@@ -215,6 +220,19 @@ void test_omp_all_memory()
   }
 }
 #endif // OMP51
+
+#ifdef OMP60
+void test_threadset()
+{
+#pragma omp task threadset(omp_team)
+  {
+  }
+#pragma omp task threadset(omp_pool)
+  {
+  }
+}
+#endif // OMP60
+
 #endif
 // CHECK1-LABEL: define {{[^@]+}}@main
 // CHECK1-SAME: () #[[ATTR0:[0-9]+]] {
@@ -10243,3 +10261,18 @@ void test_omp_all_memory()
 // CHECK4-51-NEXT:    call void @__cxx_global_var_init()
 // CHECK4-51-NEXT:    ret void
 //
+// CHECK6-LABEL: define void @_Z14test_threadsetv()
+// CHECK6-NEXT:  entry:
+// CHECK6-NEXT:       [[AGG_CAPTURED:%.*]] = alloca [[STRUCT_ANON_23:%.*]], align 1
+// CHECK6-NEXT:       [[AGG_CAPTURED2:%.*]] = alloca [[STRUCT_ANON_25:%.*]], align 1
+// CHECK6-NEXT:       call i32 @__kmpc_global_thread_num(ptr @[[GLOB_PTR:[0-9]+]])
+// CHECK6-NEXT:       [[TMP0:%.*]] = call ptr @__kmpc_omp_task_alloc(ptr @1, i32 %omp_global_thread_num, i32 1, i64 40, i64 1, ptr @.omp_task_entry..[[ENTRY1:[0-9]+]])
+// CHECK6-NEXT:       getelementptr inbounds nuw %struct.kmp_task_t_with_privates{{.*}}, ptr %0, i32 0, i32 0
+// CHECK6-NEXT:       call i32 @__kmpc_global_thread_num(ptr @[[GLOB_PTR:[0-9]+]])
+// CHECK6-NEXT:       call i32 @__kmpc_omp_task(ptr @1, i32 %omp_global_thread_num1, ptr %0)
+// CHECK6-NEXT:       call i32 @__kmpc_global_thread_num(ptr @[[GLOB_PTR2:[0-9]+]])
+// CHECK6-NEXT:       [[TMP3:%.*]] = call ptr @__kmpc_omp_task_alloc(ptr @1, i32 %omp_global_thread_num3, i32 129, i64 40, i64 1, ptr @.omp_task_entry..[[ENTRY2:[0-9]+]])
+// CHECK6-NEXT:       getelementptr inbounds nuw %struct.kmp_task_t_with_privates{{.*}}, ptr %3, i32 0, i32 0
+// CHECK6-NEXT:       call i32 @__kmpc_global_thread_num(ptr @[[GLOB_PTR2:[0-9]+]])
+// CHECK6-NEXT:       call i32 @__kmpc_omp_task(ptr @1, i32 %omp_global_thread_num4, ptr %3)
+// CHECK6-NEXT:       ret void
diff --git a/clang/test/OpenMP/task_threadset_messages.cpp b/clang/test/OpenMP/task_threadset_messages.cpp
new file mode 100755
index 0000000000000..f553a2da17ab9
--- /dev/null
+++ b/clang/test/OpenMP/task_threadset_messages.cpp
@@ -0,0 +1,99 @@
+// RUN: %clang_cc1 -verify=expected,omp45 -fopenmp -fopenmp-version=45 -std=c++11 -ferror-limit 200 -o - %s
+// RUN: %clang_cc1 -verify=expected,omp50 -fopenmp -fopenmp-version=50 -std=c++11 -ferror-limit 200 -o - %s
+// RUN: %clang_cc1 -verify=expected,omp51 -fopenmp -fopenmp-version=51 -std=c++11 -ferror-limit 200 -o - %s
+// RUN: %clang_cc1 -verify=expected -DOMP60 -fopenmp -fopenmp-version=60 -std=c++11 -ferror-limit 200 -o - %s
+
+// RUN: %clang_cc1 -verify=expected,omp45 -fopenmp-simd -fopenmp-version=45 -std=c++11 -ferror-limit 200 -o - %s
+// RUN: %clang_cc1 -verify=expected,omp50 -fopenmp-simd -fopenmp-version=50 -std=c++11 -ferror-limit 200 -o - %s
+// RUN: %clang_cc1 -verify=expected,omp51 -fopenmp-simd -fopenmp-version=51 -std=c++11 -ferror-limit 200 -o - %s
+// RUN: %clang_cc1 -verify=expected -DOMP60 -fopenmp-simd -fopenmp-version=60 -std=c++11 -ferror-limit 200 -o - %s
+
+#ifdef OMP60
+struct ComplexStruct {
+  int data[10];
+  struct InnerStruct {
+    float value;
+  } inner;
+};
+
+// Template class with member functions using 'threadset'.
+template <typename T>
+class TemplateClass {
+public:
+  void foo() {
+    #pragma omp task threadset(omp_pool)
+    {
+      T temp;
+    }
+  }
+  void bar() {
+    #pragma omp taskloop threadset(omp_team)
+    for (int i = 0; i < 10; ++i) {}
+  }
+};
+
+// Valid uses of 'threadset' with 'omp_pool' and 'omp_team' in task directive.
+void test_task_threadset_valid() {
+  int a;
+  #pragma omp task threadset(omp_pool)
+  #pragma omp task threadset(omp_team)
+  #pragma omp task threadset(omp_pool) if(1)
+  #pragma omp task threadset(omp_team) priority(5)
+  #pragma omp task threadset(omp_pool) depend(out: a)
+  #pragma omp parallel
+  {
+    #pragma omp task threadset(omp_pool)
+    {
+      #pragma omp taskloop threadset(omp_team)
+      for (int i = 0; i < 5; ++i) {}
+    }
+  }
+
+  TemplateClass<int> obj;
+  obj.foo();
+  obj.bar();
+}
+
+// Invalid uses of 'threadset' with incorrect arguments in task directive.
+void test_task_threadset_invalid_args() {
+  #pragma omp task threadset(invalid_arg) // expected-error {{expected 'omp_pool' or 'omp_team' in OpenMP clause 'threadset'}}
+  #pragma omp task threadset(123) // expected-error {{expected 'omp_pool' or 'omp_team' in OpenMP clause 'threadset'}}
+  #pragma omp task threadset(omp_pool, omp_team) // expected-error {{expected ')'}} expected-note {{to match this '('}}
+  #pragma omp task threadset() // expected-error {{expected 'omp_pool' or 'omp_team' in OpenMP clause 'threadset'}}
+  {}
+}
+
+// Valid uses of 'threadset' with 'omp_pool' and 'omp_team' in taskloop directive.
+void test_taskloop_threadset_valid() {
+  #pragma omp taskloop threadset(omp_pool)
+  for (int i = 0; i < 10; ++i) {}
+  #pragma omp taskloop threadset(omp_team)
+  for (int i = 0; i < 10; ++i) {}
+  #pragma omp taskloop threadset(omp_pool) grainsize(5)
+  for (int i = 0; i < 10; ++i) {}
+  #pragma omp taskloop threadset(omp_team) num_tasks(2)
+  for (int i = 0; i < 10; ++i) {}
+}
+
+// Invalid uses of 'threadset' with incorrect arguments in taskloop directive.
+void test_taskloop_threadset_invalid_args() {
+  #pragma omp taskloop threadset(invalid_arg) // expected-error {{expected 'omp_pool' or 'omp_team' in OpenMP clause 'threadset'}}
+  for (int i = 0; i < 10; ++i) {}
+  #pragma omp taskloop threadset(123) // expected-error {{expected 'omp_pool' or 'omp_team' in OpenMP clause 'threadset'}}
+  for (int i = 0; i < 10; ++i) {}
+  #pragma omp taskloop threadset(omp_pool, omp_team) // expected-error {{expected ')'}} expected-note {{to match this '('}}
+  for (int i = 0; i < 10; ++i) {}
+  #pragma omp taskloop threadset() // expected-error {{expected 'omp_pool' or 'omp_team' in OpenMP clause 'threadset'}}
+  for (int i = 0; i < 10; ++i) {}
+}
+
+#else
+void test_threadset_not_supported() {
+  #pragma omp task threadset(omp_pool) // omp45-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp task'}} omp50-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp task'}} omp51-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp task'}}
+  #pragma omp task threadset(omp_team) // omp45-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp task'}} omp50-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp task'}} omp51-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp task'}}
+  #pragma omp taskloop threadset(omp_team) // omp45-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp taskloop'}} omp50-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp taskloop'}} omp51-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp taskloop'}}
+  for (int i = 0; i < 10; ++i) {}
+  #pragma omp taskloop threadset(omp_pool) // omp45-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp taskloop'}} omp50-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp taskloop'}} omp51-error {{unexpected OpenMP clause 'threadset' in directive '#pragma omp taskloop'}}
+  for (int i = 0; i < 10; ++i) {}
+}
+#endif
diff --git a/clang/test/OpenMP/taskloop_ast_print.cpp b/clang/test/OpenMP/taskloop_ast_print.cpp
index 1b6d7240fa66c..e4bf20af5d78e 100644
--- a/clang/test/OpenMP/taskloop_ast_print.cpp
+++ b/clang/test/OpenMP/taskloop_ast_print.cpp
@@ -1,8 +1,10 @@
 // RUN: %clang_cc1 -verify -fopenmp -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -ast-print %s | FileCheck %s --check-prefix=CHECK60
 // RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp -std=c++11 -include-pch %t -verify %s -ast-print | FileCheck %s
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=60 -DOMP60 -ast-print %s | FileCheck %s --check-prefix=CHECK60
 // RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -std=c++11 -include-pch %t -verify %s -ast-print | FileCheck %s
 // expected-no-diagnostics
@@ -87,6 +89,20 @@ int main(int argc, char **argv) {
   // CHECK-NEXT: #pragma omp cancel taskgroup
   // CHECK-NEXT: #pragma omp cancellation point taskgroup
   // CHECK-NEXT: foo();
+#ifdef OMP60
+#pragma omp taskloop threadset(omp_team)
+  for (int i = 0; i < 10; ++i) {
+#pragma omp taskloop threadset(omp_pool)
+  for (int j = 0; j < 10; ++j) {
+    foo();
+  }
+}
+#endif
+ // CHECK60: #pragma omp taskloop threadset(omp_team)
+ // CHECK60-NEXT: for (int i = 0; i < 10; ++i) {
+ // CHECK60: #pragma omp taskloop threadset(omp_pool)
+ // CHECK60-NEXT: for (int j = 0; j < 10; ++j) {
+ // CHECK60-NEXT: foo();
   return (tmain<int, 5>(argc) + tmain<char, 1>(argv[0][0]));
 }
 
diff --git a/clang/test/OpenMP/taskloop_codegen.cpp b/clang/test/OpenMP/taskloop_codegen.cpp
index 69f8d3b160bfd..d1197607a2684 100644
--- a/clang/test/OpenMP/taskloop_codegen.cpp
+++ b/clang/test/OpenMP/taskloop_codegen.cpp
@@ -5,7 +5,12 @@
 // RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp-simd -x c++ -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY0 %s
 // RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+
 // SIMD-ONLY0-NOT: {{__kmpc|__tgt}}
+
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK6
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple x86_64-apple-darwin10 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK6
 // expected-no-diagnostics
 #ifndef HEADER
 #define HEADER
@@ -241,4 +246,52 @@ void taskloop_with_class() {
   }
 }
 
+#ifdef OMP60
+void test_threadset()
+{
+#pragma omp taskloop threadset(omp_team)
+  for (int i = 0; i < 10; ++i) {
+  }
+#pragma omp taskloop threadset(omp_pool)
+  for (int i = 0; i < 10; ++i) {
+  }
+}
+#endif // OMP60
+// CHECK6-LABEL: define void @_Z14test_threadsetv()
+// CHECK6-NEXT:  entry:
+// CHECK6-NEXT:       [[AGG_CAPTURED:%.*]] = alloca [[STRUCT_ANON_14:%.*]], align 1
+// CHECK6-NEXT:       %[[TMP:.*]] = alloca i32, align 4
+// CHECK6-NEXT:       [[AGG_CAPTURED1:%.*]] = alloca [[STRUCT_ANON_16:%.*]], align 1
+// CHECK6-NEXT:       %[[TMP2:.*]] = alloca i32, align 4
+// CHECK6-NEXT:       %[[TID0:.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB_PTR:[0-9]+]])
+// CHECK6-NEXT:       call void @__kmpc_taskgroup(ptr @1, i32 %[[TID0:.*]])
+// CHECK6-NEXT:       %[[TID1:.*]] = call ptr @__kmpc_omp_task_alloc(ptr @1, i32 %[[TID0:.*]], i32 1, i64 80, i64 1, ptr @.omp_task_entry..[[ENTRY1:[0-9]+]])
+// CHECK6-NEXT:       %[[TID2:.*]] = getelementptr inbounds nuw %struct.kmp_task_t_with_privates{{.*}}, ptr %[[TID1:.*]], i32 0, i32 0
+// CHECK6-NEXT:       %[[TID3:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID2:.*]], i32 0, i32 5
+// CHECK6-NEXT:       store i64 0, ptr %[[TID3:.*]], align 8
+// CHECK6-NEXT:       %[[TID4:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID2:.*]], i32 0, i32 6
+// CHECK6-NEXT:       store i64 9, ptr %[[TID4:.*]], align 8
+// CHECK6-NEXT:       %[[TID5:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID2:.*]], i32 0, i32 7
+// CHECK6-NEXT:       store i64 1, ptr %[[TID5:.*]], align 8
+// CHECK6-NEXT:       %[[TID6:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID2:.*]], i32 0, i32 9
+// CHECK6-NEXT:       call void @llvm.memset.p0.i64(ptr align 8 %[[TID6:.*]], i8 0, i64 8, i1 false)
+// CHECK6-NEXT:       %[[TID7:.*]] = load i64, ptr %[[TID5:.*]], align 8
+// CHECK6-NEXT:       call void @__kmpc_taskloop(ptr @1, i32 %[[TID0:.*]], ptr %[[TID1:.*]], i32 1, ptr %[[TID3:.*]], ptr %4, i64 %[[TID7:.*]], i32 1, i32 0, i64 0, ptr null)
+// CHECK6-NEXT:       call void @__kmpc_end_taskgroup(ptr @1, i32 %[[TID0:.*]])
+// CHECK6-NEXT:       call void @__kmpc_taskgroup(ptr @1, i32 %[[TID0:.*]])
+// CHECK6-NEXT:       %[[TID8:.*]] = call ptr @__kmpc_omp_task_alloc(ptr @1, i32 %[[TID0:.*]], i32 129, i64 80, i64 1, ptr @.omp_task_entry..[[ENTRY1:[0-9]+]])
+// CHECK6-NEXT:       %[[TID9:.*]] = getelementptr inbounds nuw %struct.kmp_task_t_with_privates{{.*}}, ptr %[[TID8:.*]], i32 0, i32 0
+// CHECK6-NEXT:       %[[TID10:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID9:.*]], i32 0, i32 5
+// CHECK6-NEXT:       store i64 0, ptr %[[TID10:.*]], align 8
+// CHECK6-NEXT:       %[[TID11:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID9:.*]], i32 0, i32 6
+// CHECK6-NEXT:       store i64 9, ptr %[[TID11:.*]], align 8
+// CHECK6-NEXT:       %[[TID12:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID9:.*]], i32 0, i32 7
+// CHECK6-NEXT:       store i64 1, ptr %[[TID12:.*]], align 8
+// CHECK6-NEXT:       %[[TID13:.*]] = getelementptr inbounds nuw %struct.kmp_task_t{{.*}}, ptr %[[TID9:.*]], i32 0, i32 9
+// CHECK6-NEXT:       call void @llvm.memset.p0.i64(ptr align 8 [[TID13:.*]], i8 0, i64 8, i1 false)
+// CHECK6-NEXT:       %[[TID14:.*]] = load i64, ptr [[TID12:.*]], align 8
+// CHECK6-NEXT:       call void @__kmpc_taskloop(ptr @1, i32 %[[TID0:.*]], ptr %[[TID8:.*]], i32 1, ptr %[[TID10:.*]], ptr %[[TID11:.*]], i64 %[[TID14:.*]], i32 1, i32 0, i64 0, ptr null)
+// CHECK6-NEXT:       call void @__kmpc_end_taskgroup(ptr @1, i32 %[[TID0:.*]])
+// CHECK6-NEXT:       ret void
+
 #endif
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index fc27fd29da933..08776d9bcabfc 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -2406,6 +2406,8 @@ void OMPClauseEnqueue::VisitOMPCompareClause(const OMPCompareClause *) {}
 
 void OMPClauseEnqueue::VisitOMPFailClause(const OMPFailClause *) {}
 
+void OMPClauseEnqueue::VisitOMPThreadsetClause(const OMPThreadsetClause *) {}
+
 void OMPClauseEnqueue::VisitOMPAbsentClause(const OMPAbsentClause *) {}
 
 void OMPClauseEnqueue::VisitOMPHoldsClause(const OMPHoldsClause *) {}
diff --git a/clang/www/cxx_dr_status.html b/clang/www/cxx_dr_status.html
index b7da22cf9fb22..ae9b28ee625cd 100755
--- a/clang/www/cxx_dr_status.html
+++ b/clang/www/cxx_dr_status.html
@@ -3113,11 +3113,11 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
     <td>Default initialization of POD classes?</td>
     <td class="na" align="center">N/A</td>
   </tr>
-  <tr class="open" id="511">
+  <tr id="511">
     <td><a href="https://cplusplus.github.io/CWG/issues/511.html">511</a></td>
-    <td>open</td>
+    <td>NAD</td>
     <td>POD-structs with template assignment operators</td>
-    <td align="center">Not resolved</td>
+    <td class="unknown" align="center">Unknown</td>
   </tr>
   <tr id="512">
     <td><a href="https://cplusplus.github.io/CWG/issues/512.html">512</a></td>
@@ -10895,7 +10895,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="1845">
     <td><a href="https://cplusplus.github.io/CWG/issues/1845.html">1845</a></td>
-    <td>drafting</td>
+    <td>review</td>
     <td>Point of instantiation of a variable template specialization</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -12081,7 +12081,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="2042">
     <td><a href="https://cplusplus.github.io/CWG/issues/2042.html">2042</a></td>
-    <td>drafting</td>
+    <td>review</td>
     <td>Exceptions and deallocation functions</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -12335,7 +12335,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
     <td><a href="https://cplusplus.github.io/CWG/issues/2084.html">2084</a></td>
     <td>CD4</td>
     <td>NSDMIs and deleted union default constructors</td>
-    <td class="unknown" align="center">Unknown</td>
+    <td class="full" align="center">Clang 3.1</td>
   </tr>
   <tr id="2085">
     <td><a href="https://cplusplus.github.io/CWG/issues/2085.html">2085</a></td>
@@ -12837,7 +12837,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="2168">
     <td><a href="https://cplusplus.github.io/CWG/issues/2168.html">2168</a></td>
-    <td>open</td>
+    <td>review</td>
     <td>Narrowing conversions and +/- infinity</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -14237,11 +14237,11 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
     <td>Constexpr virtual functions and temporary objects</td>
     <td class="unknown" align="center">Unknown</td>
   </tr>
-  <tr class="open" id="2401">
+  <tr id="2401">
     <td><a href="https://cplusplus.github.io/CWG/issues/2401.html">2401</a></td>
-    <td>drafting</td>
+    <td>C++20</td>
     <td>Array decay vs prohibition of subobject non-type arguments</td>
-    <td align="center">Not resolved</td>
+    <td class="unknown" align="center">Unknown</td>
   </tr>
   <tr id="2402">
     <td><a href="https://cplusplus.github.io/CWG/issues/2402.html">2402</a></td>
@@ -15171,7 +15171,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="2555">
     <td><a href="https://cplusplus.github.io/CWG/issues/2555.html">2555</a></td>
-    <td>drafting</td>
+    <td>tentatively ready</td>
     <td>Ineffective redeclaration prevention for <I>using-declarator</I>s</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -15311,23 +15311,23 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
     <td>Undefined behavior for preprocessing directives in macro arguments</td>
     <td align="center">Not resolved</td>
   </tr>
-  <tr class="open" id="2578">
+  <tr id="2578">
     <td><a href="https://cplusplus.github.io/CWG/issues/2578.html">2578</a></td>
-    <td>open</td>
+    <td>CD7</td>
     <td>Undefined behavior when creating an invalid string literal via stringizing</td>
-    <td align="center">Not resolved</td>
+    <td class="unknown" align="center">Unknown</td>
   </tr>
-  <tr class="open" id="2579">
+  <tr id="2579">
     <td><a href="https://cplusplus.github.io/CWG/issues/2579.html">2579</a></td>
-    <td>open</td>
+    <td>CD7</td>
     <td>Undefined behavior when token pasting does not create a preprocessing token</td>
-    <td align="center">Not resolved</td>
+    <td class="unknown" align="center">Unknown</td>
   </tr>
-  <tr class="open" id="2580">
+  <tr id="2580">
     <td><a href="https://cplusplus.github.io/CWG/issues/2580.html">2580</a></td>
-    <td>open</td>
+    <td>CD7</td>
     <td>Undefined behavior with <TT>#line</TT></td>
-    <td align="center">Not resolved</td>
+    <td class="unknown" align="center">Unknown</td>
   </tr>
   <tr class="open" id="2581">
     <td><a href="https://cplusplus.github.io/CWG/issues/2581.html">2581</a></td>
@@ -17104,7 +17104,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="2875">
     <td><a href="https://cplusplus.github.io/CWG/issues/2875.html">2875</a></td>
-    <td>review</td>
+    <td>tentatively ready</td>
     <td>Missing support for round-tripping null pointer values through indirection/address operators</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -17400,7 +17400,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="2923">
     <td><a href="https://cplusplus.github.io/CWG/issues/2923.html">2923</a></td>
-    <td>review</td>
+    <td>tentatively ready</td>
     <td>Note about infinite loops and execution steps</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -17760,7 +17760,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="2983">
     <td><a href="https://cplusplus.github.io/CWG/issues/2983.html">2983</a></td>
-    <td>open</td>
+    <td>review</td>
     <td>Non-type template parameters are not variables</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -17868,7 +17868,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="3001">
     <td><a href="https://cplusplus.github.io/CWG/issues/3001.html">3001</a></td>
-    <td>review</td>
+    <td>tentatively ready</td>
     <td>Inconsistent restrictions for <TT>static_cast</TT> on pointers to out-of-lifetime objects</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -17932,7 +17932,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="3011">
     <td><a href="https://cplusplus.github.io/CWG/issues/3011.html">3011</a></td>
-    <td>open</td>
+    <td>tentatively ready</td>
     <td>Parenthesized aggregate initialization for <I>new-expression</I>s</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -17992,7 +17992,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="3021">
     <td><a href="https://cplusplus.github.io/CWG/issues/3021.html">3021</a></td>
-    <td>open</td>
+    <td>drafting</td>
     <td>Subsumption rules for fold expanded constraints</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -18058,7 +18058,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="3032">
     <td><a href="https://cplusplus.github.io/CWG/issues/3032.html">3032</a></td>
-    <td>open</td>
+    <td>tentatively ready</td>
     <td>Template argument disambiguation</td>
     <td align="center">Not resolved</td>
   </tr>
@@ -18184,7 +18184,7 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
   </tr>
   <tr class="open" id="3053">
     <td><a href="https://cplusplus.github.io/CWG/issues/3053.html">3053</a></td>
-    <td>open</td>
+    <td>tentatively ready</td>
     <td>Allowing <TT>#undef likely</TT></td>
     <td align="center">Not resolved</td>
   </tr>
@@ -18265,6 +18265,210 @@ <h2 id="cxxdr">C++ defect report implementation status</h2>
     <td>tentatively ready</td>
     <td>Declarative <I>nested-name-specifier</I> in explicit instantiation</td>
     <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3067">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3067.html">3067</a></td>
+    <td>open</td>
+    <td>Array-to-pointer conversion with object type mismatch</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3068">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3068.html">3068</a></td>
+    <td>open</td>
+    <td>Access checking in friends involving <I>qualified-id</I>s</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3069">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3069.html">3069</a></td>
+    <td>open</td>
+    <td>Reference to wrong placeholder</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3070">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3070.html">3070</a></td>
+    <td>open</td>
+    <td>Trivial assignment can skip member subobjects</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3071">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3071.html">3071</a></td>
+    <td>open</td>
+    <td>Negative <TT>tuple_size</TT> in structured bindings</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3072">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3072.html">3072</a></td>
+    <td>open</td>
+    <td>Incorrect examples for lambda SFINAE</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3073">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3073.html">3073</a></td>
+    <td>open</td>
+    <td>Dependence of <I>R</I> on <TT>T2</TT> is unclear</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3074">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3074.html">3074</a></td>
+    <td>tentatively ready</td>
+    <td>Redundant ill-formedness for module macros</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3075">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3075.html">3075</a></td>
+    <td>tentatively ready</td>
+    <td>Unclear matching of import directive</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3076">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3076.html">3076</a></td>
+    <td>tentatively ready</td>
+    <td>Remove unnecessary IFNDR for malformed <I>header-name-token</I>s</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3077">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3077.html">3077</a></td>
+    <td>tentatively ready</td>
+    <td>Undesirable formation of <TT>import</TT> directive with <I>string-literal</I></td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3078">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3078.html">3078</a></td>
+    <td>review</td>
+    <td>Different treatment of <TT>#include</TT> <I>pp-tokens</I> and <I>header-name-tokens</I></td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3079">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3079.html">3079</a></td>
+    <td>open</td>
+    <td>Allow <I>empty-declaration</I>s in anonymous unions</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3080">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3080.html">3080</a></td>
+    <td>tentatively ready</td>
+    <td>Clarify kinds of permitted template template arguments</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3081">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3081.html">3081</a></td>
+    <td>review</td>
+    <td>Require glvalue when splicing direct base class relationship</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3082">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3082.html">3082</a></td>
+    <td>tentatively ready</td>
+    <td>Allow for call-compatible function types in <TT>reinterpret_cast</TT></td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3083">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3083.html">3083</a></td>
+    <td>tentatively ready</td>
+    <td>Remove redundant restrictions on class and enum definitions</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3084">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3084.html">3084</a></td>
+    <td>tentatively ready</td>
+    <td><I>compound-statement</I>s inside <I>iteration-statement</I>s</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3085">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3085.html">3085</a></td>
+    <td>tentatively ready</td>
+    <td>Apply restriction inside for-range-declaration</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3086">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3086.html">3086</a></td>
+    <td>tentatively ready</td>
+    <td>Destringizing should consider all sorts of encoding-prefixes</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3087">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3087.html">3087</a></td>
+    <td>open</td>
+    <td>Destringizing for raw string literals</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3088">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3088.html">3088</a></td>
+    <td>open</td>
+    <td>Clarify macro treatment of identifiers with special meaning</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3089">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3089.html">3089</a></td>
+    <td>tentatively ready</td>
+    <td>const-default-constructible improperly handles std::meta::info</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3090">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3090.html">3090</a></td>
+    <td>tentatively ready</td>
+    <td>Internal linkage from header units</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3091">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3091.html">3091</a></td>
+    <td>review</td>
+    <td>Linking of translation units as sequences of tokens</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3092">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3092.html">3092</a></td>
+    <td>tentatively ready</td>
+    <td><I>base-specifier</I>s are not "declared"</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3093">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3093.html">3093</a></td>
+    <td>open</td>
+    <td>Missing integration of direct base class relationships</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3094">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3094.html">3094</a></td>
+    <td>review</td>
+    <td>Rework phases for string literal concatenation and token formation</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3095">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3095.html">3095</a></td>
+    <td>open</td>
+    <td>Type-dependent packs that are not structured binding packs</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3096">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3096.html">3096</a></td>
+    <td>open</td>
+    <td>Value-dependence of size of structured binding pack with non-dependent initializer</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3097">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3097.html">3097</a></td>
+    <td>tentatively ready</td>
+    <td>Lambda expression introduces a scope</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3098">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3098.html">3098</a></td>
+    <td>tentatively ready</td>
+    <td>Remove redundancy "names or designates"</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3099">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3099.html">3099</a></td>
+    <td>open</td>
+    <td>Instantiation of type aliases from alias templates is unspecified</td>
+    <td align="center">Not resolved</td>
+  </tr>
+  <tr class="open" id="3100">
+    <td><a href="https://cplusplus.github.io/CWG/issues/3100.html">3100</a></td>
+    <td>open</td>
+    <td>Destruction order for objects with static storage duration</td>
+    <td align="center">Not resolved</td>
   </tr></table>
 
 </div>
diff --git a/flang/include/flang/Lower/OpenMP/Clauses.h b/flang/include/flang/Lower/OpenMP/Clauses.h
index 74924661d9a03..688d01704370d 100644
--- a/flang/include/flang/Lower/OpenMP/Clauses.h
+++ b/flang/include/flang/Lower/OpenMP/Clauses.h
@@ -294,6 +294,7 @@ using Permutation = tomp::clause::PermutationT<TypeTy, IdTy, ExprTy>;
 using TaskReduction = tomp::clause::TaskReductionT<TypeTy, IdTy, ExprTy>;
 using ThreadLimit = tomp::clause::ThreadLimitT<TypeTy, IdTy, ExprTy>;
 using Threads = tomp::clause::ThreadsT<TypeTy, IdTy, ExprTy>;
+using Threadset = tomp::clause::ThreadsetT<TypeTy, IdTy, ExprTy>;
 using Transparent = tomp::clause::TransparentT<TypeTy, IdTy, ExprTy>;
 using To = tomp::clause::ToT<TypeTy, IdTy, ExprTy>;
 using UnifiedAddress = tomp::clause::UnifiedAddressT<TypeTy, IdTy, ExprTy>;
diff --git a/flang/include/flang/Parser/dump-parse-tree.h b/flang/include/flang/Parser/dump-parse-tree.h
index bb970691c85c9..a7398a4ef970f 100644
--- a/flang/include/flang/Parser/dump-parse-tree.h
+++ b/flang/include/flang/Parser/dump-parse-tree.h
@@ -685,6 +685,8 @@ class ParseTreeDumper {
   NODE_ENUM(OmpTaskDependenceType, Value)
   NODE(parser, OmpTaskReductionClause)
   NODE(OmpTaskReductionClause, Modifier)
+  NODE(parser, OmpThreadsetClause)
+  NODE_ENUM(OmpThreadsetClause, ThreadsetPolicy)
   NODE(parser, OmpToClause)
   NODE(OmpToClause, Modifier)
   NODE(parser, OmpTraitProperty)
diff --git a/flang/include/flang/Parser/parse-tree.h b/flang/include/flang/Parser/parse-tree.h
index c3a8c2eab15f2..375790af90b74 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -4825,6 +4825,14 @@ struct OmpTaskReductionClause {
   std::tuple<MODIFIERS(), OmpObjectList> t;
 };
 
+// Ref: [6.0:442]
+// threadset-clause ->
+//     THREADSET(omp_pool|omp_team)
+struct OmpThreadsetClause {
+  ENUM_CLASS(ThreadsetPolicy, Omp_Pool, Omp_Team)
+  WRAPPER_CLASS_BOILERPLATE(OmpThreadsetClause, ThreadsetPolicy);
+};
+
 // Ref: [4.5:107-109], [5.0:176-180], [5.1:205-210], [5.2:167-168]
 //
 // to-clause (in DECLARE TARGET) ->
diff --git a/flang/lib/Lower/OpenMP/Clauses.cpp b/flang/lib/Lower/OpenMP/Clauses.cpp
index d39f9dda92a28..0f60b47991004 100644
--- a/flang/lib/Lower/OpenMP/Clauses.cpp
+++ b/flang/lib/Lower/OpenMP/Clauses.cpp
@@ -1482,6 +1482,21 @@ ThreadLimit make(const parser::OmpClause::ThreadLimit &inp,
   return ThreadLimit{/*Threadlim=*/makeExpr(inp.v, semaCtx)};
 }
 
+Threadset make(const parser::OmpClause::Threadset &inp,
+               semantics::SemanticsContext &semaCtx) {
+  // inp.v -> parser::OmpThreadsetClause
+  using wrapped = parser::OmpThreadsetClause;
+
+  CLAUSET_ENUM_CONVERT( //
+      convert, wrapped::ThreadsetPolicy, Threadset::ThreadsetPolicy,
+      // clang-format off
+      MS(Omp_Pool, Omp_Pool)
+      MS(Omp_Team, Omp_Team)
+      // clang-format on
+  );
+  return Threadset{/*ThreadsetPolicy=*/convert(inp.v.v)};
+}
+
 // Threadprivate: empty
 // Threads: empty
 
diff --git a/flang/lib/Semantics/check-omp-structure.cpp b/flang/lib/Semantics/check-omp-structure.cpp
index ec69e014fe0e7..e0b78eda68909 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -3406,6 +3406,7 @@ CHECK_SIMPLE_CLAUSE(Read, OMPC_read)
 CHECK_SIMPLE_CLAUSE(Threadprivate, OMPC_threadprivate)
 CHECK_SIMPLE_CLAUSE(Groupprivate, OMPC_groupprivate)
 CHECK_SIMPLE_CLAUSE(Threads, OMPC_threads)
+CHECK_SIMPLE_CLAUSE(Threadset, OMPC_threadset)
 CHECK_SIMPLE_CLAUSE(Inbranch, OMPC_inbranch)
 CHECK_SIMPLE_CLAUSE(Link, OMPC_link)
 CHECK_SIMPLE_CLAUSE(Indirect, OMPC_indirect)
diff --git a/libcxx/include/__config b/libcxx/include/__config
index b4c081dcdff1b..357f77b7d27d6 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -1050,8 +1050,7 @@ typedef __char32_t char32_t;
 #    define _LIBCPP_CTAD_SUPPORTED_FOR_TYPE(_ClassName) static_assert(true, "")
 #  endif
 
-// TODO(LLVM 22): Remove the workaround
-#  if defined(__OBJC__) && (!defined(_LIBCPP_CLANG_VER) || _LIBCPP_CLANG_VER < 2001)
+#  if defined(__OBJC__) && defined(_LIBCPP_APPLE_CLANG_VER)
 #    define _LIBCPP_WORKAROUND_OBJCXX_COMPILER_INTRINSICS
 #  endif
 
@@ -1255,14 +1254,6 @@ typedef __char32_t char32_t;
 #    define _LIBCPP_DIAGNOSE_NULLPTR
 #  endif
 
-// TODO(LLVM 22): Remove this macro once LLVM19 support ends. __cpp_explicit_this_parameter has been set in LLVM20.
-// Clang-18 has support for deducing this, but it does not set the FTM.
-#  if defined(__cpp_explicit_this_parameter) || (defined(_LIBCPP_CLANG_VER) && _LIBCPP_CLANG_VER >= 1800)
-#    define _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER 1
-#  else
-#    define _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER 0
-#  endif
-
 #endif // __cplusplus
 
 #endif // _LIBCPP___CONFIG
diff --git a/libcxx/include/__configuration/abi.h b/libcxx/include/__configuration/abi.h
index c9936df30ff7f..38b85c6ac70de 100644
--- a/libcxx/include/__configuration/abi.h
+++ b/libcxx/include/__configuration/abi.h
@@ -61,14 +61,6 @@
 // According to the Standard, `bitset::operator[] const` returns bool
 #  define _LIBCPP_ABI_BITSET_VECTOR_BOOL_CONST_SUBSCRIPT_RETURN_BOOL
 
-// In LLVM 20, we've changed to take these ABI breaks unconditionally. These flags only exist in case someone is running
-// into the static_asserts we added to catch the ABI break and don't care that it is one.
-// TODO(LLVM 22): Remove these flags
-#  define _LIBCPP_ABI_LIST_REMOVE_NODE_POINTER_UB
-#  define _LIBCPP_ABI_TREE_REMOVE_NODE_POINTER_UB
-#  define _LIBCPP_ABI_FIX_UNORDERED_NODE_POINTER_UB
-#  define _LIBCPP_ABI_FORWARD_LIST_REMOVE_NODE_POINTER_UB
-
 // These flags are documented in ABIGuarantees.rst
 #  define _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
 #  define _LIBCPP_ABI_DO_NOT_EXPORT_BASIC_STRING_COMMON
diff --git a/libcxx/include/__format/format_arg.h b/libcxx/include/__format/format_arg.h
index ed5e76275ea87..19794f0f084ce 100644
--- a/libcxx/include/__format/format_arg.h
+++ b/libcxx/include/__format/format_arg.h
@@ -149,7 +149,7 @@ _LIBCPP_HIDE_FROM_ABI decltype(auto) __visit_format_arg(_Visitor&& __vis, basic_
   __libcpp_unreachable();
 }
 
-#  if _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#  if _LIBCPP_STD_VER >= 26
 
 template <class _Rp, class _Visitor, class _Context>
 _LIBCPP_HIDE_FROM_ABI _Rp __visit_format_arg(_Visitor&& __vis, basic_format_arg<_Context> __arg) {
@@ -200,7 +200,7 @@ _LIBCPP_HIDE_FROM_ABI _Rp __visit_format_arg(_Visitor&& __vis, basic_format_arg<
   __libcpp_unreachable();
 }
 
-#  endif // _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#  endif // _LIBCPP_STD_VER >= 26
 
 /// Contains the values used in basic_format_arg.
 ///
@@ -285,7 +285,7 @@ class _LIBCPP_NO_SPECIALIZATIONS basic_format_arg {
 
   _LIBCPP_HIDE_FROM_ABI explicit operator bool() const noexcept { return __type_ != __format::__arg_t::__none; }
 
-#  if _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#  if _LIBCPP_STD_VER >= 26
 
   // This function is user facing, so it must wrap the non-standard types of
   // the "variant" in a handle to stay conforming. See __arg_t for more details.
@@ -329,7 +329,7 @@ class _LIBCPP_NO_SPECIALIZATIONS basic_format_arg {
     }
   }
 
-#  endif // _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#  endif // _LIBCPP_STD_VER >= 26
 
 private:
   using char_type = typename _Context::char_type;
@@ -371,11 +371,8 @@ class basic_format_arg<_Context>::handle {
 // This function is user facing, so it must wrap the non-standard types of
 // the "variant" in a handle to stay conforming. See __arg_t for more details.
 template <class _Visitor, class _Context>
-#  if _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
-_LIBCPP_DEPRECATED_IN_CXX26
-#  endif
-    _LIBCPP_HIDE_FROM_ABI decltype(auto)
-    visit_format_arg(_Visitor&& __vis, basic_format_arg<_Context> __arg) {
+_LIBCPP_DEPRECATED_IN_CXX26 _LIBCPP_HIDE_FROM_ABI decltype(auto)
+visit_format_arg(_Visitor&& __vis, basic_format_arg<_Context> __arg) {
   switch (__arg.__type_) {
 #  if _LIBCPP_HAS_INT128
   case __format::__arg_t::__i128: {
@@ -387,7 +384,7 @@ _LIBCPP_DEPRECATED_IN_CXX26
     typename __basic_format_arg_value<_Context>::__handle __h{__arg.__value_.__u128_};
     return std::invoke(std::forward<_Visitor>(__vis), typename basic_format_arg<_Context>::handle{__h});
   }
-#  endif // _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#  endif // _LIBCPP_HAS_INT128
   default:
     return std::__visit_format_arg(std::forward<_Visitor>(__vis), __arg);
   }
diff --git a/libcxx/include/__format/format_context.h b/libcxx/include/__format/format_context.h
index e672ee7ad0581..1771dd34b82fb 100644
--- a/libcxx/include/__format/format_context.h
+++ b/libcxx/include/__format/format_context.h
@@ -175,13 +175,13 @@ class basic_format_context<typename __format::__retarget_buffer<_CharT>::__itera
                   __format::__determine_arg_t<basic_format_context, decltype(__arg)>(),
                   __basic_format_arg_value<basic_format_context>(__arg)};
           };
-#  if _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#  if _LIBCPP_STD_VER >= 26
           return static_cast<_Context*>(__c)->arg(__id).visit(std::move(__visitor));
 #  else
           _LIBCPP_SUPPRESS_DEPRECATED_PUSH
           return std::visit_format_arg(std::move(__visitor), static_cast<_Context*>(__c)->arg(__id));
           _LIBCPP_SUPPRESS_DEPRECATED_POP
-#  endif // _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#  endif // _LIBCPP_STD_VER >= 26
         }) {
   }
 
diff --git a/libcxx/include/__hash_table b/libcxx/include/__hash_table
index 5432abb4ab39d..e1897949a47e6 100644
--- a/libcxx/include/__hash_table
+++ b/libcxx/include/__hash_table
@@ -83,18 +83,6 @@ struct __hash_node_base {
   typedef _NodePtr __node_pointer;
   typedef __node_base_pointer __next_pointer;
 
-// TODO(LLVM 22): Remove this check
-#ifndef _LIBCPP_ABI_FIX_UNORDERED_NODE_POINTER_UB
-  static_assert(sizeof(__node_base_pointer) == sizeof(__node_pointer) && _LIBCPP_ALIGNOF(__node_base_pointer) ==
-                    _LIBCPP_ALIGNOF(__node_pointer),
-                "It looks like you are using std::__hash_table (an implementation detail for the unordered containers) "
-                "with a fancy pointer type that thas a different representation depending on whether it points to a "
-                "__hash_table base pointer or a __hash_table node pointer (both of which are implementation details of "
-                "the standard library). This means that your ABI is being broken between LLVM 19 and LLVM 20. If you "
-                "don't care about your ABI being broken, define the _LIBCPP_ABI_TREE_REMOVE_NODE_POINTER_UB macro to "
-                "silence this diagnostic.");
-#endif
-
   __next_pointer __next_;
 
   _LIBCPP_HIDE_FROM_ABI __next_pointer __ptr() _NOEXCEPT {
diff --git a/libcxx/include/__iterator/concepts.h b/libcxx/include/__iterator/concepts.h
index f38688734b38a..3b43920443636 100644
--- a/libcxx/include/__iterator/concepts.h
+++ b/libcxx/include/__iterator/concepts.h
@@ -117,15 +117,12 @@ template <class _Tp>
 concept __signed_integer_like = signed_integral<_Tp>;
 
 template <class _Ip>
-concept weakly_incrementable =
-    // TODO: remove this once the clang bug is fixed (https://llvm.org/PR48173).
-    !same_as<_Ip, bool> && // Currently, clang does not handle bool correctly.
-    movable<_Ip> && requires(_Ip __i) {
-      typename iter_difference_t<_Ip>;
-      requires __signed_integer_like<iter_difference_t<_Ip>>;
-      { ++__i } -> same_as<_Ip&>; // not required to be equality-preserving
-      __i++;                      // not required to be equality-preserving
-    };
+concept weakly_incrementable = movable<_Ip> && requires(_Ip __i) {
+  typename iter_difference_t<_Ip>;
+  requires __signed_integer_like<iter_difference_t<_Ip>>;
+  { ++__i } -> same_as<_Ip&>; // not required to be equality-preserving
+  __i++;                      // not required to be equality-preserving
+};
 
 // [iterator.concept.inc]
 template <class _Ip>
diff --git a/libcxx/include/__math/traits.h b/libcxx/include/__math/traits.h
index 00db2a8289fb3..ff22cee7305d7 100644
--- a/libcxx/include/__math/traits.h
+++ b/libcxx/include/__math/traits.h
@@ -25,33 +25,26 @@ namespace __math {
 
 // signbit
 
-// TODO(LLVM 22): Remove conditional once support for Clang 19 is dropped.
-#if defined(_LIBCPP_COMPILER_GCC) || __has_constexpr_builtin(__builtin_signbit)
-#  define _LIBCPP_SIGNBIT_CONSTEXPR _LIBCPP_CONSTEXPR_SINCE_CXX23
-#else
-#  define _LIBCPP_SIGNBIT_CONSTEXPR
-#endif
-
 // The universal C runtime (UCRT) in the WinSDK provides floating point overloads
 // for std::signbit(). By defining our overloads as templates, we can work around
 // this issue as templates are less preferred than non-template functions.
 template <class = void>
-[[__nodiscard__]] inline _LIBCPP_SIGNBIT_CONSTEXPR _LIBCPP_HIDE_FROM_ABI bool signbit(float __x) _NOEXCEPT {
+[[__nodiscard__]] inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI bool signbit(float __x) _NOEXCEPT {
   return __builtin_signbit(__x);
 }
 
 template <class = void>
-[[__nodiscard__]] inline _LIBCPP_SIGNBIT_CONSTEXPR _LIBCPP_HIDE_FROM_ABI bool signbit(double __x) _NOEXCEPT {
+[[__nodiscard__]] inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI bool signbit(double __x) _NOEXCEPT {
   return __builtin_signbit(__x);
 }
 
 template <class = void>
-[[__nodiscard__]] inline _LIBCPP_SIGNBIT_CONSTEXPR _LIBCPP_HIDE_FROM_ABI bool signbit(long double __x) _NOEXCEPT {
+[[__nodiscard__]] inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI bool signbit(long double __x) _NOEXCEPT {
   return __builtin_signbit(__x);
 }
 
 template <class _A1, __enable_if_t<is_integral<_A1>::value, int> = 0>
-[[__nodiscard__]] inline _LIBCPP_SIGNBIT_CONSTEXPR _LIBCPP_HIDE_FROM_ABI bool signbit(_A1 __x) _NOEXCEPT {
+[[__nodiscard__]] inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI bool signbit(_A1 __x) _NOEXCEPT {
   return __x < 0;
 }
 
diff --git a/libcxx/include/__ranges/transform_view.h b/libcxx/include/__ranges/transform_view.h
index ae85dfa452d72..ab1adf9cdbe68 100644
--- a/libcxx/include/__ranges/transform_view.h
+++ b/libcxx/include/__ranges/transform_view.h
@@ -13,7 +13,6 @@
 #include <__compare/three_way_comparable.h>
 #include <__concepts/constructible.h>
 #include <__concepts/convertible_to.h>
-#include <__concepts/copyable.h>
 #include <__concepts/derived_from.h>
 #include <__concepts/equality_comparable.h>
 #include <__concepts/invocable.h>
@@ -64,7 +63,7 @@ concept __regular_invocable_with_range_ref = regular_invocable<_Fn, range_refere
 template <class _View, class _Fn>
 concept __transform_view_constraints =
     view<_View> && is_object_v<_Fn> && regular_invocable<_Fn&, range_reference_t<_View>> &&
-    __is_referenceable_v<invoke_result_t<_Fn&, range_reference_t<_View>>>;
+    __referenceable<invoke_result_t<_Fn&, range_reference_t<_View>>>;
 
 #  if _LIBCPP_STD_VER >= 23
 template <input_range _View, move_constructible _Fn>
diff --git a/libcxx/include/__tree b/libcxx/include/__tree
index 0738c8c6a5e2b..694796922c914 100644
--- a/libcxx/include/__tree
+++ b/libcxx/include/__tree
@@ -823,18 +823,6 @@ public:
   using __node_allocator _LIBCPP_NODEBUG = __rebind_alloc<__alloc_traits, __node>;
   using __node_traits _LIBCPP_NODEBUG    = allocator_traits<__node_allocator>;
 
-// TODO(LLVM 22): Remove this check
-#ifndef _LIBCPP_ABI_TREE_REMOVE_NODE_POINTER_UB
-  static_assert(sizeof(__node_base_pointer) == sizeof(__end_node_pointer) && _LIBCPP_ALIGNOF(__node_base_pointer) ==
-                    _LIBCPP_ALIGNOF(__end_node_pointer),
-                "It looks like you are using std::__tree (an implementation detail for (multi)map/set) with a fancy "
-                "pointer type that thas a different representation depending on whether it points to a __tree base "
-                "pointer or a __tree node pointer (both of which are implementation details of the standard library). "
-                "This means that your ABI is being broken between LLVM 19 and LLVM 20. If you don't care about your "
-                "ABI being broken, define the _LIBCPP_ABI_TREE_REMOVE_NODE_POINTER_UB macro to silence this "
-                "diagnostic.");
-#endif
-
 private:
   // check for sane allocator pointer rebinding semantics. Rebinding the
   // allocator for a new pointer type should be exactly the same as rebinding
diff --git a/libcxx/include/__type_traits/reference_constructs_from_temporary.h b/libcxx/include/__type_traits/reference_constructs_from_temporary.h
index 2ff549b4e15ce..3d097ce90cb09 100644
--- a/libcxx/include/__type_traits/reference_constructs_from_temporary.h
+++ b/libcxx/include/__type_traits/reference_constructs_from_temporary.h
@@ -30,14 +30,8 @@ _LIBCPP_NO_SPECIALIZATIONS inline constexpr bool reference_constructs_from_tempo
 
 #endif
 
-#if __has_builtin(__reference_constructs_from_temporary)
 template <class _Tp, class _Up>
 inline const bool __reference_constructs_from_temporary_v = __reference_constructs_from_temporary(_Tp, _Up);
-#else
-// TODO(LLVM 22): Remove this as all supported compilers should have __reference_constructs_from_temporary implemented.
-template <class _Tp, class _Up>
-inline const bool __reference_constructs_from_temporary_v = __reference_binds_to_temporary(_Tp, _Up);
-#endif
 
 _LIBCPP_END_NAMESPACE_STD
 
diff --git a/libcxx/include/forward_list b/libcxx/include/forward_list
index df7da20cfb611..88d863f494e86 100644
--- a/libcxx/include/forward_list
+++ b/libcxx/include/forward_list
@@ -284,17 +284,6 @@ struct __forward_node_traits {
   typedef _NodePtr __node_pointer;
   typedef __forward_begin_node<_NodePtr> __begin_node;
   typedef __rebind_pointer_t<_NodePtr, __begin_node> __begin_node_pointer;
-
-// TODO(LLVM 22): Remove this check
-#  ifndef _LIBCPP_ABI_FORWARD_LIST_REMOVE_NODE_POINTER_UB
-  static_assert(sizeof(__begin_node_pointer) == sizeof(__node_pointer) && _LIBCPP_ALIGNOF(__begin_node_pointer) ==
-                    _LIBCPP_ALIGNOF(__node_pointer),
-                "It looks like you are using std::forward_list with a fancy pointer type that thas a different "
-                "representation depending on whether it points to a forward_list base pointer or a forward_list node "
-                "pointer (both of which are implementation details of the standard library). This means that your ABI "
-                "is being broken between LLVM 19 and LLVM 20. If you don't care about your ABI being broken, define "
-                "the _LIBCPP_ABI_FORWARD_LIST_REMOVE_NODE_POINTER_UB macro to silence this diagnostic.");
-#  endif
 };
 
 template <class _NodePtr>
diff --git a/libcxx/include/list b/libcxx/include/list
index c5c2a8508999c..0ff85d2ebcb86 100644
--- a/libcxx/include/list
+++ b/libcxx/include/list
@@ -276,17 +276,6 @@ template <class _Tp, class _VoidPtr>
 struct __list_node_pointer_traits {
   typedef __rebind_pointer_t<_VoidPtr, __list_node<_Tp, _VoidPtr> > __node_pointer;
   typedef __rebind_pointer_t<_VoidPtr, __list_node_base<_Tp, _VoidPtr> > __base_pointer;
-
-// TODO(LLVM 22): Remove this check
-#  ifndef _LIBCPP_ABI_LIST_REMOVE_NODE_POINTER_UB
-  static_assert(sizeof(__node_pointer) == sizeof(__node_pointer) && _LIBCPP_ALIGNOF(__base_pointer) ==
-                    _LIBCPP_ALIGNOF(__node_pointer),
-                "It looks like you are using std::list with a fancy pointer type that thas a different representation "
-                "depending on whether it points to a list base pointer or a list node pointer (both of which are "
-                "implementation details of the standard library). This means that your ABI is being broken between "
-                "LLVM 19 and LLVM 20. If you don't care about your ABI being broken, define the "
-                "_LIBCPP_ABI_LIST_REMOVE_NODE_POINTER_UB macro to silence this diagnostic.");
-#  endif
 };
 
 template <class _Tp, class _VoidPtr>
diff --git a/libcxx/include/tuple b/libcxx/include/tuple
index 5f3bb72e0678b..466f501b5f4f8 100644
--- a/libcxx/include/tuple
+++ b/libcxx/include/tuple
@@ -301,7 +301,7 @@ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 bool __tuple_compare_equal(c
 template <class _Tp, class _Up, class _IndexSeq = make_index_sequence<tuple_size_v<_Tp>>>
 inline constexpr bool __can_tuple_compare_equal = false;
 
-// TODO(LLVM 22): Remove `tuple_size_v<_Tp> == tuple_size_v<_Up>` here once once LLVM-20 support ends
+// TODO(LLVM 23): Remove `tuple_size_v<_Tp> == tuple_size_v<_Up>` here once once LLVM-20 support ends
 // because the resolution of CWG2369 landed in LLVM-21.
 template <class _Tp, class _Up, size_t... _Is>
   requires(tuple_size_v<_Tp> == tuple_size_v<_Up>)
@@ -328,7 +328,7 @@ concept __tuple_like_no_tuple = __tuple_like<_Tp> && !__is_tuple_v<_Tp>;
 template <class _Tp, class _Up, class _IndexSeq>
 struct __tuple_common_comparison_category_impl {};
 
-// TODO(LLVM 22): Remove `tuple_size_v<_Tp> == tuple_size_v<_Up>` here once once LLVM-20 support ends
+// TODO(LLVM 23): Remove `tuple_size_v<_Tp> == tuple_size_v<_Up>` here once once LLVM-20 support ends
 // because the resolution of CWG2369 landed in LLVM-21.
 template <class _Tp, class _Up, size_t... _Is>
   requires(tuple_size_v<_Tp> == tuple_size_v<_Up>) && requires {
diff --git a/libcxx/include/variant b/libcxx/include/variant
index 9beef146f203c..8e958581a6b07 100644
--- a/libcxx/include/variant
+++ b/libcxx/include/variant
@@ -1299,7 +1299,7 @@ public:
     __impl_.__swap(__that.__impl_);
   }
 
-#    if _LIBCPP_STD_VER >= 26 && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
+#    if _LIBCPP_STD_VER >= 26
   // Helper class to implement [variant.visit]/10
   //   Constraints: The call to visit does not use an explicit template-argument-list
   //   that begins with a type template-argument.
diff --git a/libcxx/test/libcxx-03/utilities/meta/is_referenceable.compile.pass.cpp b/libcxx/test/libcxx-03/utilities/meta/is_referenceable.compile.pass.cpp
index 093bbae289723..f39d1a5da41af 100644
--- a/libcxx/test/libcxx-03/utilities/meta/is_referenceable.compile.pass.cpp
+++ b/libcxx/test/libcxx-03/utilities/meta/is_referenceable.compile.pass.cpp
@@ -7,7 +7,7 @@
 //===----------------------------------------------------------------------===//
 //
 
-// __is_referenceable_v<Tp>
+// __libcpp_is_referenceable<Tp>
 //
 // [defns.referenceable] defines "a referenceable type" as:
 // An object type, a function type that does not have cv-qualifiers
diff --git a/libcxx/test/libcxx/numerics/c.math/constexpr-cxx23-clang.pass.cpp b/libcxx/test/libcxx/numerics/c.math/constexpr-cxx23-clang.pass.cpp
index 3f17f21e8c108..20887b8cf2678 100644
--- a/libcxx/test/libcxx/numerics/c.math/constexpr-cxx23-clang.pass.cpp
+++ b/libcxx/test/libcxx/numerics/c.math/constexpr-cxx23-clang.pass.cpp
@@ -220,16 +220,9 @@ int main(int, char**) {
   ASSERT_CONSTEXPR_CXX23(std::isnormal(-1.0) == 1);
   ASSERT_CONSTEXPR_CXX23(std::isnormal(-1.0L) == 1);
 
-// TODO(LLVM 22): Remove `__has_constexpr_builtin` conditional once support for Clang 19 is dropped.
-#if !__has_constexpr_builtin(__builtin_signbit)
-  ASSERT_NOT_CONSTEXPR_CXX23(std::signbit(-1.0f) == 1);
-  ASSERT_NOT_CONSTEXPR_CXX23(std::signbit(-1.0) == 1);
-  ASSERT_NOT_CONSTEXPR_CXX23(std::signbit(-1.0L) == 1);
-#else
   ASSERT_CONSTEXPR_CXX23(std::signbit(-1.0f) == 1);
   ASSERT_CONSTEXPR_CXX23(std::signbit(-1.0) == 1);
   ASSERT_CONSTEXPR_CXX23(std::signbit(-1.0L) == 1);
-#endif
 
   ASSERT_NOT_CONSTEXPR_CXX23(std::isgreater(-1.0f, 0.0f) == 0);
   ASSERT_NOT_CONSTEXPR_CXX23(std::isgreater(-1.0, 0.0) == 0);
diff --git a/libcxx/test/libcxx/utilities/expected/expected.expected/transform_error.mandates.verify.cpp b/libcxx/test/libcxx/utilities/expected/expected.expected/transform_error.mandates.verify.cpp
index 09ebd0069b3a9..3e9bdd98cd394 100644
--- a/libcxx/test/libcxx/utilities/expected/expected.expected/transform_error.mandates.verify.cpp
+++ b/libcxx/test/libcxx/utilities/expected/expected.expected/transform_error.mandates.verify.cpp
@@ -8,15 +8,6 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
 
-// With clang-cl, some warnings have a 'which is a Microsoft extension' suffix
-// which break the tests. But #102851 will turn it into an error, making the test pass.
-// However, upstream libcxx buildbots do not build clang from source while testing, so
-// this tests still expected to fail on these bots.
-//
-// TODO(LLVM 22): Remove '0-1' from 'expected-error-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}}}'
-// and remove 'expected-warning-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}, which is a Microsoft extension}}'
-// once LLVM 22 releases. See https://llvm.org/PR104885.
-
 // Test the mandates
 
 // template<class F> constexpr auto transform_error(F&& f) &;
@@ -55,41 +46,39 @@ void test() {
   {
     std::expected<int, int> e;
     e.transform_error(return_unexpected<int&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
     // expected-error-re@*:* {{static assertion failed {{.*}}[expected.object.general] A program that instantiates the definition of template expected<T, E> for {{.*}} is ill-formed.}}
-    // expected-error-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}}}
+    // expected-error-re@*:* {{union member {{.*}} has reference type {{.*}}}}
 
     e.transform_error(return_no_object<int&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
     // expected-error-re@*:* {{static assertion failed {{.*}}[expected.object.general] A program that instantiates the definition of template expected<T, E> for {{.*}} is ill-formed.}}
-    // expected-warning-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}, which is a Microsoft extension}}
   }
 
   // Test const& overload
   {
     const std::expected<int, int> e;
     e.transform_error(return_unexpected<const int &>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-2 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
     e.transform_error(return_no_object<const int &>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-2 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
   }
 
   // Test && overload
   {
     std::expected<int, int> e;
     std::move(e).transform_error(return_unexpected<int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-2 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
     std::move(e).transform_error(return_no_object<int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-2 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
   }
 
   // Test const&& overload
   {
     const std::expected<int, int> e;
     std::move(e).transform_error(return_unexpected<const int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-2 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
     std::move(e).transform_error(return_no_object<const int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-2 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
   }
 }
 // clang-format on
diff --git a/libcxx/test/libcxx/utilities/expected/expected.void/transform_error.mandates.verify.cpp b/libcxx/test/libcxx/utilities/expected/expected.void/transform_error.mandates.verify.cpp
index 9fd7452af64fb..c5acc27af03ea 100644
--- a/libcxx/test/libcxx/utilities/expected/expected.void/transform_error.mandates.verify.cpp
+++ b/libcxx/test/libcxx/utilities/expected/expected.void/transform_error.mandates.verify.cpp
@@ -8,16 +8,6 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
 
-// With clang-cl, some warnings have a 'which is a Microsoft extension' suffix
-// which break the tests. But #102851 will turn it into an error, making the test pass.
-// However, upstream libcxx buildbots do not build clang from source while testing, so
-// this tests still expected to fail on these bots.
-//
-// TODO(LLVM 22): Remove '0-1' from 'expected-error-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}}}'
-// and remove 'expected-warning-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}, which is a Microsoft extension}}'
-// and remove 'expected-error-re@*:* 0-1 {{call to deleted constructor of {{.*}}}}'
-// once LLVM 22 releases. See See https://llvm.org/PR104885.
-
 // Test the mandates
 
 // template<class F> constexpr auto transform_error(F&& f) &;
@@ -56,43 +46,36 @@ void test() {
   {
     std::expected<void, int> e;
     e.transform_error(return_unexpected<int&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
     // expected-error-re@*:* {{static assertion failed {{.*}}A program that instantiates expected<T, E> with a E that is not a valid argument for unexpected<E> is ill-formed}}
-    // expected-error-re@*:* 0-1 {{call to deleted constructor of {{.*}}}}
-    // expected-error-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}}}
+    // expected-error-re@*:* {{union member {{.*}} has reference type {{.*}}}}
 
     e.transform_error(return_no_object<int&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
     // expected-error-re@*:* {{static assertion failed {{.*}}A program that instantiates expected<T, E> with a E that is not a valid argument for unexpected<E> is ill-formed}}
-    // expected-warning-re@*:* 0-1 {{union member {{.*}} has reference type {{.*}}, which is a Microsoft extension}}
   }
 
   // Test const& overload
   {
     const std::expected<void, int> e;
     e.transform_error(return_unexpected<const int &>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
     e.transform_error(return_no_object<const int &>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
-    // expected-error-re@*:* 0-1 {{call to deleted constructor of {{.*}}}}
+    // expected-error-re@*:* {{no matching constructor for initialization of{{.*}}}}
   }
 
   // Test && overload
   {
     std::expected<void, int> e;
     std::move(e).transform_error(return_unexpected<int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
     std::move(e).transform_error(return_no_object<int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
   }
 
   // Test const&& overload
   {
     const std::expected<void, int> e;
     std::move(e).transform_error(return_unexpected<const int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
     std::move(e).transform_error(return_no_object<const int&&>); // expected-error-re@*:* {{static assertion failed {{.*}}The result of {{.*}} must be a valid template argument for unexpected}}
-    // expected-error-re@*:* 0-1 {{no matching constructor for initialization of{{.*}}}}
   }
 }
 // clang-format on
diff --git a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.pass.cpp b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.pass.cpp
index 20e0a5ed66bd0..68fe8b6de41d6 100644
--- a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.pass.cpp
+++ b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.pass.cpp
@@ -8,8 +8,6 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20, c++23
 // UNSUPPORTED: GCC-ALWAYS_INLINE-FIXME
-// The tested functionality needs deducing this.
-// XFAIL: apple-clang
 
 // <format>
 
diff --git a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.return_type.pass.cpp b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.return_type.pass.cpp
index 8a79dd4d50f20..4ae63e896caed 100644
--- a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.return_type.pass.cpp
+++ b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit.return_type.pass.cpp
@@ -8,8 +8,6 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20, c++23
 // UNSUPPORTED: GCC-ALWAYS_INLINE-FIXME
-// The tested functionality needs deducing this.
-// XFAIL: apple-clang
 
 // <format>
 
diff --git a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.deprecated.verify.cpp b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.deprecated.verify.cpp
index 146ceba58872e..77df72d3c4c6c 100644
--- a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.deprecated.verify.cpp
+++ b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.deprecated.verify.cpp
@@ -8,7 +8,6 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20, c++23
 // UNSUPPORTED: GCC-ALWAYS_INLINE-FIXME
-// XFAIL: apple-clang
 
 // <format>
 
diff --git a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.pass.cpp b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.pass.cpp
index d99675a71f321..9b7c8a7f4f8b4 100644
--- a/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.pass.cpp
+++ b/libcxx/test/std/utilities/format/format.arguments/format.arg/visit_format_arg.pass.cpp
@@ -9,6 +9,8 @@
 // UNSUPPORTED: c++03, c++11, c++14, c++17
 // UNSUPPORTED: GCC-ALWAYS_INLINE-FIXME
 
+// ADDITIONAL_COMPILE_FLAGS: -D_LIBCPP_DISABLE_DEPRECATION_WARNINGS
+
 // <format>
 
 // template<class Visitor, class Context>
@@ -25,10 +27,6 @@
 #include "make_string.h"
 #include "min_allocator.h"
 
-#if TEST_STD_VER >= 26 && defined(TEST_HAS_EXPLICIT_THIS_PARAMETER)
-TEST_CLANG_DIAGNOSTIC_IGNORED("-Wdeprecated-declarations")
-#endif
-
 template <class Context, class To, class From>
 void test(From value) {
   auto store = std::make_format_args<Context>(value);
diff --git a/libcxx/test/std/utilities/format/format.arguments/format.args/get.pass.cpp b/libcxx/test/std/utilities/format/format.arguments/format.args/get.pass.cpp
index c7dd82d726b3a..cbddc4f437a53 100644
--- a/libcxx/test/std/utilities/format/format.arguments/format.args/get.pass.cpp
+++ b/libcxx/test/std/utilities/format/format.arguments/format.args/get.pass.cpp
@@ -32,7 +32,7 @@ void test(From value) {
     else
       assert(false);
   };
-#if TEST_STD_VER >= 26 && defined(TEST_HAS_EXPLICIT_THIS_PARAMETER)
+#if TEST_STD_VER >= 26
   format_args.get(0).visit(visitor);
 #else
   std::visit_format_arg(visitor, format_args.get(0));
@@ -47,7 +47,7 @@ void test_handle(T value) {
   std::basic_format_args<Context> format_args{store};
 
   auto visitor = [](auto a) { assert((std::is_same_v<decltype(a), typename std::basic_format_arg<Context>::handle>)); };
-#if TEST_STD_VER >= 26 && defined(TEST_HAS_EXPLICIT_THIS_PARAMETER)
+#if TEST_STD_VER >= 26
   format_args.get(0).visit(visitor);
 #else
   std::visit_format_arg(visitor, format_args.get(0));
@@ -73,7 +73,7 @@ void test_string_view(From value) {
     else
       assert(false);
   };
-#if TEST_STD_VER >= 26 && defined(TEST_HAS_EXPLICIT_THIS_PARAMETER)
+#if TEST_STD_VER >= 26
   format_args.get(0).visit(visitor);
 #else
   std::visit_format_arg(visitor, format_args.get(0));
diff --git a/libcxx/test/std/utilities/tuple/tuple.tuple/tuple.cnstr/PR20855_tuple_ref_binding_diagnostics.pass.cpp b/libcxx/test/std/utilities/tuple/tuple.tuple/tuple.cnstr/PR20855_tuple_ref_binding_diagnostics.pass.cpp
index d78de0eec8e53..0f6a6734264c3 100644
--- a/libcxx/test/std/utilities/tuple/tuple.tuple/tuple.cnstr/PR20855_tuple_ref_binding_diagnostics.pass.cpp
+++ b/libcxx/test/std/utilities/tuple/tuple.tuple/tuple.cnstr/PR20855_tuple_ref_binding_diagnostics.pass.cpp
@@ -16,17 +16,6 @@
 #include <tuple>
 #include <string>
 #include <cassert>
-#include "test_macros.h"
-
-#if TEST_HAS_BUILTIN(__reference_constructs_from_temporary)
-#  define ASSERT_REFERENCE_BINDS_TEMPORARY(...) static_assert(__reference_constructs_from_temporary(__VA_ARGS__), "")
-#  define ASSERT_NOT_REFERENCE_BINDS_TEMPORARY(...)                                                                    \
-    static_assert(!__reference_constructs_from_temporary(__VA_ARGS__), "")
-#else
-// TODO(LLVM 22): Remove this as all support compilers should have __reference_constructs_from_temporary implemented.
-#  define ASSERT_REFERENCE_BINDS_TEMPORARY(...) static_assert(__reference_binds_to_temporary(__VA_ARGS__), "")
-#  define ASSERT_NOT_REFERENCE_BINDS_TEMPORARY(...) static_assert(!__reference_binds_to_temporary(__VA_ARGS__), "")
-#endif
 
 template <class Tp>
 struct ConvertsTo {
@@ -42,17 +31,6 @@ struct ConvertsTo {
 struct Base {};
 struct Derived : Base {};
 
-
-static_assert(std::is_same<decltype("abc"), decltype(("abc"))>::value, "");
-ASSERT_REFERENCE_BINDS_TEMPORARY(std::string const&, decltype("abc"));
-ASSERT_REFERENCE_BINDS_TEMPORARY(std::string const&, decltype(("abc")));
-ASSERT_REFERENCE_BINDS_TEMPORARY(std::string const&, const char*&&);
-
-ASSERT_NOT_REFERENCE_BINDS_TEMPORARY(int&, const ConvertsTo<int&>&);
-ASSERT_NOT_REFERENCE_BINDS_TEMPORARY(const int&, ConvertsTo<int&>&);
-ASSERT_NOT_REFERENCE_BINDS_TEMPORARY(Base&, Derived&);
-
-
 static_assert(std::is_constructible<int&, std::reference_wrapper<int>>::value, "");
 static_assert(std::is_constructible<int const&, std::reference_wrapper<int>>::value, "");
 
diff --git a/libcxx/test/std/utilities/variant/variant.visit.member/robust_against_adl.pass.cpp b/libcxx/test/std/utilities/variant/variant.visit.member/robust_against_adl.pass.cpp
index 7be7c7ff9122b..38cf34a9c699c 100644
--- a/libcxx/test/std/utilities/variant/variant.visit.member/robust_against_adl.pass.cpp
+++ b/libcxx/test/std/utilities/variant/variant.visit.member/robust_against_adl.pass.cpp
@@ -7,7 +7,6 @@
 //===----------------------------------------------------------------------===//
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20, c++23
-// XFAIL: apple-clang
 
 // <variant>
 
diff --git a/libcxx/test/std/utilities/variant/variant.visit.member/visit.pass.cpp b/libcxx/test/std/utilities/variant/variant.visit.member/visit.pass.cpp
index f68112d30fc35..aeb1297c136ae 100644
--- a/libcxx/test/std/utilities/variant/variant.visit.member/visit.pass.cpp
+++ b/libcxx/test/std/utilities/variant/variant.visit.member/visit.pass.cpp
@@ -7,8 +7,6 @@
 //===----------------------------------------------------------------------===//
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20, c++23
-// The tested functionality needs deducing this.
-// XFAIL: apple-clang
 
 // <variant>
 
diff --git a/libcxx/test/std/utilities/variant/variant.visit.member/visit_return_type.pass.cpp b/libcxx/test/std/utilities/variant/variant.visit.member/visit_return_type.pass.cpp
index 90320ae518c34..7ca05908ab340 100644
--- a/libcxx/test/std/utilities/variant/variant.visit.member/visit_return_type.pass.cpp
+++ b/libcxx/test/std/utilities/variant/variant.visit.member/visit_return_type.pass.cpp
@@ -7,8 +7,6 @@
 //===----------------------------------------------------------------------===//
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20, c++23
-// The tested functionality needs deducing this.
-// XFAIL: apple-clang
 
 // <variant>
 
diff --git a/libcxx/test/support/test_basic_format_arg.h b/libcxx/test/support/test_basic_format_arg.h
index f51f6e97cbed0..99cd558c3c5bf 100644
--- a/libcxx/test/support/test_basic_format_arg.h
+++ b/libcxx/test/support/test_basic_format_arg.h
@@ -21,7 +21,7 @@ bool test_basic_format_arg(std::basic_format_arg<Context> arg, T expected) {
     else
       return false;
   };
-#if TEST_STD_VER >= 26 && defined(TEST_HAS_EXPLICIT_THIS_PARAMETER)
+#if TEST_STD_VER >= 26
   return arg.visit(std::move(visitor));
 #else
   return std::visit_format_arg(std::move(visitor), arg);
diff --git a/libcxx/test/support/test_macros.h b/libcxx/test/support/test_macros.h
index c4e1600572456..8d88d6fad7d0b 100644
--- a/libcxx/test/support/test_macros.h
+++ b/libcxx/test/support/test_macros.h
@@ -531,13 +531,6 @@ inline Tp const& DoNotOptimize(Tp const& value) {
 #  define TEST_IF_AIX(arg_true, arg_false) arg_false
 #endif
 
-// Clang-18 has support for deducing this, but it does not set the FTM.
-#ifdef _LIBCPP_USE_FROZEN_CXX03_HEADERS
-// This is a C++20 featue, so we don't care whether the compiler could support it
-#elif defined(_LIBCPP_VERSION) && _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER
-#  define TEST_HAS_EXPLICIT_THIS_PARAMETER
-#endif
-
 // Placement `operator new`/`operator new[]` are not yet constexpr in C++26
 // when using MS ABI, because they are from <vcruntime_new.h>.
 #if defined(__cpp_lib_constexpr_new) && __cpp_lib_constexpr_new >= 202406L
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index 8f3652172dfdf..d892c01f0bc71 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -10,8 +10,8 @@
 import subprocess
 import signal
 import sys
+import threading
 import warnings
-import selectors
 import time
 from typing import (
     Any,
@@ -139,6 +139,35 @@ def dump_memory(base_addr, data, num_per_line, outfile):
         outfile.write("\n")
 
 
+def read_packet(
+    f: IO[bytes], trace_file: Optional[IO[str]] = None
+) -> Optional[ProtocolMessage]:
+    """Decode a JSON packet that starts with the content length and is
+    followed by the JSON bytes from a file 'f'. Returns None on EOF.
+    """
+    line = f.readline().decode("utf-8")
+    if len(line) == 0:
+        return None  # EOF.
+
+    # Watch for line that starts with the prefix
+    prefix = "Content-Length: "
+    if line.startswith(prefix):
+        # Decode length of JSON bytes
+        length = int(line[len(prefix) :])
+        # Skip empty line
+        separator = f.readline().decode()
+        if separator != "":
+            Exception("malformed DAP content header, unexpected line: " + separator)
+        # Read JSON bytes
+        json_str = f.read(length).decode()
+        if trace_file:
+            trace_file.write("from adapter:\n%s\n" % (json_str))
+        # Decode the JSON bytes into a python dictionary
+        return json.loads(json_str)
+
+    raise Exception("unexpected malformed message from lldb-dap: " + line)
+
+
 def packet_type_is(packet, packet_type):
     return "type" in packet and packet["type"] == packet_type
 
@@ -170,8 +199,16 @@ def __init__(
         self.log_file = log_file
         self.send = send
         self.recv = recv
-        self.selector = selectors.DefaultSelector()
-        self.selector.register(recv, selectors.EVENT_READ)
+
+        # Packets that have been received and processed but have not yet been
+        # requested by a test case.
+        self._pending_packets: List[Optional[ProtocolMessage]] = []
+        # Received packets that have not yet been processed.
+        self._recv_packets: List[Optional[ProtocolMessage]] = []
+        # Used as a mutex for _recv_packets and for notify when _recv_packets
+        # changes.
+        self._recv_condition = threading.Condition()
+        self._recv_thread = threading.Thread(target=self._read_packet_thread)
 
         # session state
         self.init_commands = init_commands
@@ -197,6 +234,9 @@ def __init__(
         # keyed by breakpoint id
         self.resolved_breakpoints: dict[str, Breakpoint] = {}
 
+        # trigger enqueue thread
+        self._recv_thread.start()
+
     @classmethod
     def encode_content(cls, s: str) -> bytes:
         return ("Content-Length: %u\r\n\r\n%s" % (len(s), s)).encode("utf-8")
@@ -212,46 +252,17 @@ def validate_response(cls, command, response):
                 f"seq mismatch in response {command['seq']} != {response['request_seq']}"
             )
 
-    def _read_packet(
-        self,
-        timeout: float = DEFAULT_TIMEOUT,
-    ) -> Optional[ProtocolMessage]:
-        """Decode a JSON packet that starts with the content length and is
-        followed by the JSON bytes from self.recv. Returns None on EOF.
-        """
-
-        ready = self.selector.select(timeout)
-        if not ready:
-            warnings.warn(
-                "timeout occurred waiting for a packet, check if the test has a"
-                " negative assertion and see if it can be inverted.",
-                stacklevel=4,
-            )
-            return None  # timeout
-
-        line = self.recv.readline().decode("utf-8")
-        if len(line) == 0:
-            return None  # EOF.
-
-        # Watch for line that starts with the prefix
-        prefix = "Content-Length: "
-        if line.startswith(prefix):
-            # Decode length of JSON bytes
-            length = int(line[len(prefix) :])
-            # Skip empty line
-            separator = self.recv.readline().decode()
-            if separator != "":
-                Exception("malformed DAP content header, unexpected line: " + separator)
-            # Read JSON bytes
-            json_str = self.recv.read(length).decode()
-            if self.trace_file:
-                self.trace_file.write(
-                    "%s from adapter:\n%s\n" % (time.time(), json_str)
-                )
-            # Decode the JSON bytes into a python dictionary
-            return json.loads(json_str)
-
-        raise Exception("unexpected malformed message from lldb-dap: " + line)
+    def _read_packet_thread(self):
+        try:
+            while True:
+                packet = read_packet(self.recv, trace_file=self.trace_file)
+                # `packet` will be `None` on EOF. We want to pass it down to
+                # handle_recv_packet anyway so the main thread can handle unexpected
+                # termination of lldb-dap and stop waiting for new packets.
+                if not self._handle_recv_packet(packet):
+                    break
+        finally:
+            dump_dap_log(self.log_file)
 
     def get_modules(
         self, start_module: Optional[int] = None, module_count: Optional[int] = None
@@ -299,6 +310,34 @@ def collect_output(
             output += self.get_output(category, clear=clear)
         return output
 
+    def _enqueue_recv_packet(self, packet: Optional[ProtocolMessage]):
+        with self.recv_condition:
+            self.recv_packets.append(packet)
+            self.recv_condition.notify()
+
+    def _handle_recv_packet(self, packet: Optional[ProtocolMessage]) -> bool:
+        """Handles an incoming packet.
+
+        Called by the read thread that is waiting for all incoming packets
+        to store the incoming packet in "self._recv_packets" in a thread safe
+        way. This function will then signal the "self._recv_condition" to
+        indicate a new packet is available.
+
+        Args:
+            packet: A new packet to store.
+
+        Returns:
+            True if the caller should keep calling this function for more
+            packets.
+        """
+        with self._recv_condition:
+            self._recv_packets.append(packet)
+            self._recv_condition.notify()
+            # packet is None on EOF
+            return packet is not None and not (
+                packet["type"] == "response" and packet["command"] == "disconnect"
+            )
+
     def _recv_packet(
         self,
         *,
@@ -322,34 +361,46 @@ def _recv_packet(
             The first matching packet for the given predicate, if specified,
             otherwise None.
         """
-        deadline = time.time() + timeout
-
-        while time.time() < deadline:
-            packet = self._read_packet(timeout=deadline - time.time())
-            if packet is None:
-                return None
-            self._process_recv_packet(packet)
-            if not predicate or predicate(packet):
-                return packet
-
-    def _process_recv_packet(self, packet) -> None:
+        assert (
+            threading.current_thread != self._recv_thread
+        ), "Must not be called from the _recv_thread"
+
+        def process_until_match():
+            self._process_recv_packets()
+            for i, packet in enumerate(self._pending_packets):
+                if packet is None:
+                    # We need to return a truthy value to break out of the
+                    # wait_for, use `EOFError` as an indicator of EOF.
+                    return EOFError()
+                if predicate and predicate(packet):
+                    self._pending_packets.pop(i)
+                    return packet
+
+        with self._recv_condition:
+            packet = self._recv_condition.wait_for(process_until_match, timeout)
+            return None if isinstance(packet, EOFError) else packet
+
+    def _process_recv_packets(self) -> None:
         """Process received packets, updating the session state."""
-        if packet and ("seq" not in packet or packet["seq"] == 0):
-            warnings.warn(
-                f"received a malformed packet, expected 'seq != 0' for {packet!r}"
-            )
-        # Handle events that may modify any stateful properties of
-        # the DAP session.
-        if packet and packet["type"] == "event":
-            self._handle_event(packet)
-        elif packet and packet["type"] == "request":
-            # Handle reverse requests and keep processing.
-            self._handle_reverse_request(packet)
+        with self._recv_condition:
+            for packet in self._recv_packets:
+                if packet and ("seq" not in packet or packet["seq"] == 0):
+                    warnings.warn(
+                        f"received a malformed packet, expected 'seq != 0' for {packet!r}"
+                    )
+                # Handle events that may modify any stateful properties of
+                # the DAP session.
+                if packet and packet["type"] == "event":
+                    self._handle_event(packet)
+                elif packet and packet["type"] == "request":
+                    # Handle reverse requests and keep processing.
+                    self._handle_reverse_request(packet)
+                # Move the packet to the pending queue.
+                self._pending_packets.append(packet)
+            self._recv_packets.clear()
 
     def _handle_event(self, packet: Event) -> None:
         """Handle any events that modify debug session state we track."""
-        self.events.append(packet)
-
         event = packet["event"]
         body: Optional[Dict] = packet.get("body", None)
 
@@ -402,8 +453,6 @@ def _handle_event(self, packet: Event) -> None:
             self.invalidated_event = packet
         elif event == "memory":
             self.memory_event = packet
-        elif event == "module":
-            self.module_events.append(packet)
 
     def _handle_reverse_request(self, request: Request) -> None:
         if request in self.reverse_requests:
@@ -472,14 +521,18 @@ def send_packet(self, packet: ProtocolMessage) -> int:
 
         Returns the seq number of the request.
         """
-        packet["seq"] = self.sequence
-        self.sequence += 1
+        # Set the seq for requests.
+        if packet["type"] == "request":
+            packet["seq"] = self.sequence
+            self.sequence += 1
+        else:
+            packet["seq"] = 0
 
         # Encode our command dictionary as a JSON string
         json_str = json.dumps(packet, separators=(",", ":"))
 
         if self.trace_file:
-            self.trace_file.write("%s to adapter:\n%s\n" % (time.time(), json_str))
+            self.trace_file.write("to adapter:\n%s\n" % (json_str))
 
         length = len(json_str)
         if length > 0:
@@ -860,8 +913,6 @@ def request_restart(self, restartArguments=None):
         if restartArguments:
             command_dict["arguments"] = restartArguments
 
-        # Clear state, the process is about to restart...
-        self._process_continued(True)
         response = self._send_recv(command_dict)
         # Caller must still call wait_for_stopped.
         return response
@@ -1428,10 +1479,8 @@ def request_testGetTargetBreakpoints(self):
 
     def terminate(self):
         self.send.close()
-        self.recv.close()
-        self.selector.close()
-        if self.log_file:
-            dump_dap_log(self.log_file)
+        if self._recv_thread.is_alive():
+            self._recv_thread.join()
 
     def request_setInstructionBreakpoints(self, memory_reference=[]):
         breakpoints = []
@@ -1528,7 +1577,6 @@ def launch(
             stdout=subprocess.PIPE,
             stderr=sys.stderr,
             env=adapter_env,
-            bufsize=0,
         )
 
         if connection is None:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index fd07324d2ddda..29935bb8046ff 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -416,7 +416,7 @@ def continue_to_next_stop(self):
         return self.dap_server.wait_for_stopped()
 
     def continue_to_breakpoint(self, breakpoint_id: str):
-        self.continue_to_breakpoints([breakpoint_id])
+        self.continue_to_breakpoints((breakpoint_id))
 
     def continue_to_breakpoints(self, breakpoint_ids):
         self.do_continue()
diff --git a/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py b/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
index 7b78541fb4f8e..beab4d6c1f5a6 100644
--- a/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
+++ b/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
@@ -81,20 +81,24 @@ def test_breakpoint_events(self):
                 breakpoint["verified"], "expect foo breakpoint to not be verified"
             )
 
+        # Flush the breakpoint events.
+        self.dap_server.wait_for_breakpoint_events()
+
         # Continue to the breakpoint
-        self.continue_to_breakpoint(foo_bp_id)
-        self.continue_to_next_stop()  # foo_bp2
-        self.continue_to_breakpoint(main_bp_id)
-        self.continue_to_exit()
+        self.continue_to_breakpoints(dap_breakpoint_ids)
 
-        bp_events = [e for e in self.dap_server.events if e["event"] == "breakpoint"]
+        verified_breakpoint_ids = []
+        unverified_breakpoint_ids = []
+        for breakpoint_event in self.dap_server.wait_for_breakpoint_events():
+            breakpoint = breakpoint_event["body"]["breakpoint"]
+            id = breakpoint["id"]
+            if breakpoint["verified"]:
+                verified_breakpoint_ids.append(id)
+            else:
+                unverified_breakpoint_ids.append(id)
 
-        main_bp_events = [
-            e for e in bp_events if e["body"]["breakpoint"]["id"] == main_bp_id
-        ]
-        foo_bp_events = [
-            e for e in bp_events if e["body"]["breakpoint"]["id"] == foo_bp_id
-        ]
+        self.assertIn(main_bp_id, unverified_breakpoint_ids)
+        self.assertIn(foo_bp_id, unverified_breakpoint_ids)
 
-        self.assertTrue(main_bp_events)
-        self.assertTrue(foo_bp_events)
+        self.assertIn(main_bp_id, verified_breakpoint_ids)
+        self.assertIn(foo_bp_id, verified_breakpoint_ids)
diff --git a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
index dc6bf38303204..ca881f1d817c5 100644
--- a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
+++ b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
@@ -156,7 +156,6 @@ def test_debuggerRoot(self):
         self.build_and_launch(
             program, debuggerRoot=program_parent_dir, initCommands=commands
         )
-        self.continue_to_exit()
         output = self.get_console()
         self.assertTrue(output and len(output) > 0, "expect console output")
         lines = output.splitlines()
@@ -172,6 +171,7 @@ def test_debuggerRoot(self):
                     % (program_parent_dir, line[len(prefix) :]),
                 )
         self.assertTrue(found, "verified lldb-dap working directory")
+        self.continue_to_exit()
 
     def test_sourcePath(self):
         """
@@ -642,6 +642,7 @@ def test_stdio_redirection(self):
     @skipIfAsan
     @skipIfWindows
     @skipIf(oslist=["linux"], archs=no_match(["x86_64"]))
+    @skipIfBuildType(["debug"])
     def test_stdio_redirection_and_console(self):
         """
         Test stdio redirection and console.
diff --git a/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py b/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
index 9d1d17b704f76..1f4afabbd161e 100644
--- a/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
+++ b/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
@@ -1,58 +1,58 @@
-"""
-Test 'module' events for dynamically loaded libraries.
-"""
-
+import dap_server
 from lldbsuite.test.decorators import *
 from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
 import lldbdap_testcase
+import re
 
 
 class TestDAP_module_event(lldbdap_testcase.DAPTestCaseBase):
-    def lookup_module_id(self, name):
-        """Returns the identifier for the first module event starting with the given name."""
-        for event in self.dap_server.module_events:
-            if self.get_dict_value(event, ["body", "module", "name"]).startswith(name):
-                return self.get_dict_value(event, ["body", "module", "id"])
-        self.fail(f"No module events matching name={name}")
-
-    def module_events(self, id):
-        """Finds all module events by identifier."""
-        return [
-            event
-            for event in self.dap_server.module_events
-            if self.get_dict_value(event, ["body", "module", "id"]) == id
-        ]
-
-    def module_reasons(self, events):
-        """Returns the list of 'reason' values from the given events."""
-        return [event["body"]["reason"] for event in events]
-
     @skipIfWindows
     def test_module_event(self):
-        """
-        Test that module events are fired on target load and when the list of
-        dynamic libraries updates while running.
-        """
         program = self.getBuildArtifact("a.out")
         self.build_and_launch(program)
-        # We can analyze the order of events after the process exits.
-        self.continue_to_exit()
 
-        a_out_id = self.lookup_module_id("a.out")
-        a_out_events = self.module_events(id=a_out_id)
+        source = "main.cpp"
+        breakpoint1_line = line_number(source, "// breakpoint 1")
+        breakpoint2_line = line_number(source, "// breakpoint 2")
+        breakpoint3_line = line_number(source, "// breakpoint 3")
 
-        self.assertIn(
-            "new",
-            self.module_reasons(a_out_events),
-            "Expected a.out to load during the debug session.",
+        breakpoint_ids = self.set_source_breakpoints(
+            source, [breakpoint1_line, breakpoint2_line, breakpoint3_line]
         )
+        self.continue_to_breakpoints(breakpoint_ids)
+
+        # We're now stopped at breakpoint 1 before the dlopen. Flush all the module events.
+        event = self.dap_server.wait_for_event(["module"])
+        while event is not None:
+            event = self.dap_server.wait_for_event(["module"])
+
+        # Continue to the second breakpoint, before the dlclose.
+        self.continue_to_breakpoints(breakpoint_ids)
+
+        # Make sure we got a module event for libother.
+        event = self.dap_server.wait_for_event(["module"])
+        self.assertIsNotNone(event, "didn't get a module event")
+        module_name = event["body"]["module"]["name"]
+        module_id = event["body"]["module"]["id"]
+        self.assertEqual(event["body"]["reason"], "new")
+        self.assertIn("libother", module_name)
+
+        # Continue to the third breakpoint, after the dlclose.
+        self.continue_to_breakpoints(breakpoint_ids)
+
+        # Make sure we got a module event for libother.
+        event = self.dap_server.wait_for_event(["module"])
+        self.assertIsNotNone(event, "didn't get a module event")
+        reason = event["body"]["reason"]
+        self.assertEqual(reason, "removed")
+        self.assertEqual(event["body"]["module"]["id"], module_id)
+
+        # The removed module event should omit everything but the module id and name
+        # as they are required fields.
+        module_data = event["body"]["module"]
+        required_keys = ["id", "name"]
+        self.assertListEqual(list(module_data.keys()), required_keys)
+        self.assertEqual(module_data["name"], "", "expects empty name.")
 
-        libother_id = self.lookup_module_id(
-            "libother."  # libother.so or libother.dylib based on OS.
-        )
-        libother_events = self.module_events(id=libother_id)
-        self.assertEqual(
-            self.module_reasons(libother_events),
-            ["new", "removed"],
-            "Expected libother to be loaded then unloaded during the debug session.",
-        )
+        self.continue_to_exit()
diff --git a/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py b/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
index 2d00c512721c6..0ed53dac5d869 100644
--- a/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
+++ b/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
@@ -64,18 +64,19 @@ def check_symbols_loaded_with_size():
         self.assertEqual(program, program_module["path"])
         self.assertIn("addressRange", program_module)
 
-        self.continue_to_exit()
-
         # Collect all the module names we saw as events.
         module_new_names = []
         module_changed_names = []
-        for module_event in self.dap_server.module_events:
+        module_event = self.dap_server.wait_for_event(["module"])
+        while module_event is not None:
             reason = module_event["body"]["reason"]
             if reason == "new":
                 module_new_names.append(module_event["body"]["module"]["name"])
             elif reason == "changed":
                 module_changed_names.append(module_event["body"]["module"]["name"])
 
+            module_event = self.dap_server.wait_for_event(["module"])
+
         # Make sure we got an event for every active module.
         self.assertNotEqual(len(module_new_names), 0)
         for module in active_modules:
@@ -85,6 +86,7 @@ def check_symbols_loaded_with_size():
         # symbols got added.
         self.assertNotEqual(len(module_changed_names), 0)
         self.assertIn(program_module["name"], module_changed_names)
+        self.continue_to_exit()
 
     @skipIfWindows
     def test_modules(self):
diff --git a/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_console.py b/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_console.py
index fa62ec243f5c5..e1ad1425a993d 100644
--- a/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_console.py
+++ b/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_console.py
@@ -30,11 +30,7 @@ def verify_stopped_on_entry(self, stopped_events: List[Dict[str, Any]]):
             if reason == "entry":
                 seen_stopped_event += 1
 
-        self.assertEqual(
-            seen_stopped_event,
-            1,
-            f"expect only one stopped entry event in {stopped_events}",
-        )
+        self.assertEqual(seen_stopped_event, 1, "expect only one stopped entry event.")
 
     @skipIfAsan
     @skipIfWindows
@@ -96,13 +92,11 @@ def test_stopOnEntry(self):
         self.build_and_launch(program, console="integratedTerminal", stopOnEntry=True)
         [bp_main] = self.set_function_breakpoints(["main"])
 
-        self.dap_server.request_configurationDone()
-        stopped_threads = list(self.dap_server.thread_stop_reasons.values())
+        self.dap_server.request_continue()  # sends configuration done
+        stopped_events = self.dap_server.wait_for_stopped()
         # We should be stopped at the entry point.
-        self.assertEqual(
-            len(stopped_threads), 1, "Expected the main thread to be stopped on entry."
-        )
-        self.assertEqual(stopped_threads[0]["reason"], "entry")
+        self.assertGreaterEqual(len(stopped_events), 0, "expect stopped events")
+        self.verify_stopped_on_entry(stopped_events)
 
         # Then, if we continue, we should hit the breakpoint at main.
         self.dap_server.request_continue()
@@ -111,12 +105,8 @@ def test_stopOnEntry(self):
         # Restart and check that we still get a stopped event before reaching
         # main.
         self.dap_server.request_restart()
-        stopped_threads = list(self.dap_server.thread_stop_reasons.values())
-        # We should be stopped at the entry point.
-        self.assertEqual(
-            len(stopped_threads), 1, "Expected the main thread to be stopped on entry."
-        )
-        self.assertEqual(stopped_threads[0]["reason"], "entry")
+        stopped_events = self.dap_server.wait_for_stopped()
+        self.verify_stopped_on_entry(stopped_events)
 
         # continue to main
         self.dap_server.request_continue()
diff --git a/lldb/test/API/tools/lldb-dap/send-event/TestDAP_sendEvent.py b/lldb/test/API/tools/lldb-dap/send-event/TestDAP_sendEvent.py
index 0184020589176..a01845669666f 100644
--- a/lldb/test/API/tools/lldb-dap/send-event/TestDAP_sendEvent.py
+++ b/lldb/test/API/tools/lldb-dap/send-event/TestDAP_sendEvent.py
@@ -32,7 +32,7 @@ def test_send_event(self):
             ],
         )
         self.set_source_breakpoints(source, [breakpoint_line])
-        self.do_continue()
+        self.continue_to_next_stop()
 
         custom_event = self.dap_server.wait_for_event(
             filter=["my-custom-event-no-body"]
diff --git a/llvm/docs/DeveloperPolicy.rst b/llvm/docs/DeveloperPolicy.rst
index 45f2df20984e6..9135406c2e2a1 100644
--- a/llvm/docs/DeveloperPolicy.rst
+++ b/llvm/docs/DeveloperPolicy.rst
@@ -1189,6 +1189,55 @@ Suggested disclaimer for the project README and the main project web page:
    necessarily a reflection of the completeness or stability of the code, it
    does indicate that the project is not yet endorsed as a component of LLVM.
 
+Adding or enabling a new LLVM pass
+----------------------------------
+
+The guidelines here are primarily targeted at the enablement of new major
+passes in the target-independent optimization pipeline. Small additions, or
+backend-specific passes, require a lesser degree of care. Before creating a new
+pass, consider whether the functionality can be integrated into an existing
+pass first. This is often both faster and more powerful.
+
+When adding a new pass, the goal should be to enable it as part of the default
+optimization pipeline as early as possible and then continue development
+incrementally. (This does not apply to passes that are only relevant for
+specific uses of LLVM, such as GC support passes.)
+
+The recommended workflow is:
+
+1. Implement a basic version of the pass and add it to the pass pipeline behind
+   a flag that is disabled by default. The initial version should focus on
+   handling simple cases correctly and efficiently.
+2. Enable the pass by default. Separating this step allows easily disabling the
+   pass if issues are encountered, without having to revert the entire
+   implementation.
+3. Incrementally extend the pass with new functionality. As the pass is already
+   enabled, it becomes easier to identify the specific change that has caused a
+   regression in correctness, optimization quality or compile-time.
+
+When enabling a pass, certain requirements must be met (in no particular order):
+
+ * **Maintenance:** The pass (and any analyses it depends on) must have at
+   least one maintainer.
+ * **Usefulness:** There should be evidence that the pass improves performance
+   (or whatever metric it optimizes for) on real-world workloads. Improvements
+   seen only on synthetic benchmarks may be insufficient.
+ * **Compile-Time:** The pass should not have a large impact on compile-time,
+   where the evaluation of what "large" means is up to reviewer discretion, and
+   may differ based on the value the pass provides. In any case, it is expected
+   that a concerted effort has been made to mitigate the compile-time impact,
+   both for the average case, and for pathological cases.
+ * **Correctness:** The pass should have no known correctness issues (except
+   global correctness issues that affect all of LLVM). If an old pass is being
+   enabled (rather than implementing a new one incrementally), additional due
+   diligence is required. The pass should be fully reviewed to ensure that it
+   still complies with current quality standards. Fuzzing with disabled
+   profitability checks may help gain additional confidence in the
+   implementation.
+
+If non-trivial issues are found in a newly enabled pass, it may be temporarily
+disabled again, until the issues have been resolved.
+
 .. _copyright-license-patents:
 
 Copyright, License, and Patents
diff --git a/llvm/include/llvm/ExecutionEngine/Orc/EPCGenericMemoryAccess.h b/llvm/include/llvm/ExecutionEngine/Orc/EPCGenericMemoryAccess.h
index c69b6f736651e..86207265021c5 100644
--- a/llvm/include/llvm/ExecutionEngine/Orc/EPCGenericMemoryAccess.h
+++ b/llvm/include/llvm/ExecutionEngine/Orc/EPCGenericMemoryAccess.h
@@ -6,7 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// Implements ExecutorProcessControl::MemoryAccess by making calls to
+// Implements the MemoryAccess interface by making calls to
 // ExecutorProcessControl::callWrapperAsync.
 //
 // This simplifies the implementaton of new ExecutorProcessControl instances,
@@ -19,6 +19,7 @@
 #define LLVM_EXECUTIONENGINE_ORC_EPCGENERICMEMORYACCESS_H
 
 #include "llvm/ExecutionEngine/Orc/Core.h"
+#include "llvm/ExecutionEngine/Orc/MemoryAccess.h"
 
 namespace llvm {
 namespace orc {
diff --git a/llvm/include/llvm/Frontend/OpenMP/ClauseT.h b/llvm/include/llvm/Frontend/OpenMP/ClauseT.h
index 87b95200b2459..d7f0e3a3d49da 100644
--- a/llvm/include/llvm/Frontend/OpenMP/ClauseT.h
+++ b/llvm/include/llvm/Frontend/OpenMP/ClauseT.h
@@ -1167,6 +1167,14 @@ struct ThreadsT {
   using EmptyTrait = std::true_type;
 };
 
+// V6.0: [14.8] `threadset` clause
+template <typename T, typename I, typename E> //
+struct ThreadsetT {
+  ENUM(ThreadsetPolicy, Omp_Pool, Omp_Team);
+  using WrapperTrait = std::true_type;
+  ThreadsetPolicy v;
+};
+
 // V5.2: [5.9.1] `to` clause
 template <typename T, typename I, typename E> //
 struct ToT {
@@ -1352,9 +1360,9 @@ using WrapperClausesT = std::variant<
     ProcBindT<T, I, E>, ReverseOffloadT<T, I, E>, SafelenT<T, I, E>,
     SelfMapsT<T, I, E>, SeverityT<T, I, E>, SharedT<T, I, E>, SimdlenT<T, I, E>,
     SizesT<T, I, E>, PermutationT<T, I, E>, ThreadLimitT<T, I, E>,
-    UnifiedAddressT<T, I, E>, UnifiedSharedMemoryT<T, I, E>, UniformT<T, I, E>,
-    UpdateT<T, I, E>, UseDeviceAddrT<T, I, E>, UseDevicePtrT<T, I, E>,
-    UsesAllocatorsT<T, I, E>>;
+    ThreadsetT<T, I, E>, UnifiedAddressT<T, I, E>,
+    UnifiedSharedMemoryT<T, I, E>, UniformT<T, I, E>, UpdateT<T, I, E>,
+    UseDeviceAddrT<T, I, E>, UseDevicePtrT<T, I, E>, UsesAllocatorsT<T, I, E>>;
 
 template <typename T, typename I, typename E>
 using UnionOfAllClausesT = typename type::Union< //
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 61a1a05f6e904..208609f64f418 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -539,6 +539,10 @@ def OMPC_GroupPrivate : Clause<[Spelling<"groupprivate">]> {
 def OMPC_Threads : Clause<[Spelling<"threads">]> {
   let clangClass = "OMPThreadsClause";
 }
+def OMPC_Threadset : Clause<[Spelling<"threadset">]> {
+  let clangClass = "OMPThreadsetClause";
+  let flangClass = "OmpThreadsetClause";
+}
 def OMPC_To : Clause<[Spelling<"to">]> {
   let clangClass = "OMPToClause";
   let flangClass = "OmpToClause";
@@ -1254,6 +1258,7 @@ def OMP_Task : Directive<[Spelling<"task">]> {
     VersionedClause<OMPC_Final>,
     VersionedClause<OMPC_If>,
     VersionedClause<OMPC_Priority>,
+    VersionedClause<OMPC_Threadset, 60>,
     VersionedClause<OMPC_Replayable, 60>,
     VersionedClause<OMPC_Transparent, 60>,
   ];
@@ -1297,6 +1302,7 @@ def OMP_TaskLoop : Directive<[Spelling<"taskloop">]> {
     VersionedClause<OMPC_Final>,
     VersionedClause<OMPC_If>,
     VersionedClause<OMPC_Priority>,
+    VersionedClause<OMPC_Threadset, 60>,
     VersionedClause<OMPC_Replayable, 60>,
     VersionedClause<OMPC_Transparent, 60>,
   ];
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index e09eea47525ee..6bfe229e0f441 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -1347,6 +1347,9 @@ void AMDGPUPassConfig::addIRPasses() {
       isPassEnabled(EnableImageIntrinsicOptimizer))
     addPass(createAMDGPUImageIntrinsicOptimizerPass(&TM));
 
+  if (EnableUniformIntrinsicCombine)
+    addPass(createAMDGPUUniformIntrinsicCombineLegacyPass());
+
   // This can be disabled by passing ::Disable here or on the command line
   // with --expand-variadics-override=disable.
   addPass(createExpandVariadicsPass(ExpandVariadicsMode::Lowering));
@@ -2098,6 +2101,8 @@ void AMDGPUCodeGenPassBuilder::addIRPasses(AddIRPass &addPass) const {
   if (isPassEnabled(EnableImageIntrinsicOptimizer))
     addPass(AMDGPUImageIntrinsicOptimizerPass(TM));
 
+  if (EnableUniformIntrinsicCombine)
+    addPass(AMDGPUUniformIntrinsicCombinePass());
   // This can be disabled by passing ::Disable here or on the command line
   // with --expand-variadics-override=disable.
   addPass(ExpandVariadicsPass(ExpandVariadicsMode::Lowering));
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index e043d072a7638..08be5df9872b7 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1534,8 +1534,8 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
   bool SrcNotDom = false;
 
   auto CaptureTrackingWithModRef =
-      [&](Instruction *AI,
-          function_ref<bool(Instruction *)> ModRefCallback) -> bool {
+      [&](Instruction *AI, function_ref<bool(Instruction *)> ModRefCallback,
+          bool &AddressCaptured) -> bool {
     SmallVector<Instruction *, 8> Worklist;
     Worklist.push_back(AI);
     unsigned MaxUsesToExplore = getDefaultMaxUsesToExploreForCaptureTracking();
@@ -1559,8 +1559,9 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
         if (!Visited.insert(&U).second)
           continue;
         UseCaptureInfo CI = DetermineUseCaptureKind(U, AI);
-        if (capturesAnything(CI.UseCC))
+        if (capturesAnyProvenance(CI.UseCC))
           return false;
+        AddressCaptured |= capturesAddress(CI.UseCC);
 
         if (UI->mayReadOrWriteMemory()) {
           if (UI->isLifetimeStartOrEnd()) {
@@ -1627,7 +1628,9 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
     return true;
   };
 
-  if (!CaptureTrackingWithModRef(DestAlloca, DestModRefCallback))
+  bool DestAddressCaptured = false;
+  if (!CaptureTrackingWithModRef(DestAlloca, DestModRefCallback,
+                                 DestAddressCaptured))
     return false;
   // Bailout if Dest may have any ModRef before Store.
   if (!ReachabilityWorklist.empty() &&
@@ -1653,7 +1656,14 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
     return true;
   };
 
-  if (!CaptureTrackingWithModRef(SrcAlloca, SrcModRefCallback))
+  bool SrcAddressCaptured = false;
+  if (!CaptureTrackingWithModRef(SrcAlloca, SrcModRefCallback,
+                                 SrcAddressCaptured))
+    return false;
+
+  // If both the source and destination address are captured, the fact that they
+  // are no longer two separate allocations may be observed.
+  if (DestAddressCaptured && SrcAddressCaptured)
     return false;
 
   // We can do the transformation. First, move the SrcAlloca to the start of the
diff --git a/llvm/test/Analysis/DependenceAnalysis/gcd-miv-overflow.ll b/llvm/test/Analysis/DependenceAnalysis/gcd-miv-overflow.ll
new file mode 100644
index 0000000000000..43f66dd7d0974
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/gcd-miv-overflow.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output "-passes=print<da>" 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-ALL
+; RUN: opt < %s -disable-output "-passes=print<da>" -da-enable-dependence-test=gcd-miv 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-GCD-MIV
+
+; offset0 = 4;
+; offset1 = 0;
+; for (i = 0; i < 100; i++) {
+;   A[offset0] = 1;
+;   A[offset1] = 2;
+;   offset0 += 3*m;
+;   offset1 += 3;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between the two
+; stores, but it does exist. E.g., consider `m` is 12297829382473034411, which
+; is a modular multiplicative inverse of 3 under modulo 2^64. Then `offset0` is
+; effectively `i + 4`, so accesses will be as follows:
+;
+;   - A[offset0] : A[4], A[5], A[6], ...
+;   - A[offset1] : A[0], A[3], A[6], ...
+;
+; The root cause is that DA interprets `3*m` in non-modular arithmetic, which
+; isn't necessarily true due to overflow.
+;
+define void @gcdmiv_coef_ovfl(ptr %A, i64 %m) {
+; CHECK-ALL-LABEL: 'gcdmiv_coef_ovfl'
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+;
+; CHECK-GCD-MIV-LABEL: 'gcdmiv_coef_ovfl'
+; CHECK-GCD-MIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-GCD-MIV-NEXT:    da analyze - consistent output [*]!
+; CHECK-GCD-MIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-GCD-MIV-NEXT:    da analyze - none!
+; CHECK-GCD-MIV-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-GCD-MIV-NEXT:    da analyze - consistent output [*]!
+;
+entry:
+  %step = mul i64 3, %m
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset.0 = phi i64 [ 4, %entry ] , [ %offset.0.next, %loop ]
+  %offset.1 = phi i64 [ 0, %entry ] , [ %offset.1.next, %loop ]
+  %gep.0 = getelementptr inbounds i8, ptr %A, i64 %offset.0
+  %gep.1 = getelementptr inbounds i8, ptr %A, i64 %offset.1
+  store i8 1, ptr %gep.0
+  store i8 2, ptr %gep.1
+  %i.inc = add nuw nsw i64 %i, 1
+  %offset.0.next = add nsw i64 %offset.0, %step
+  %offset.1.next = add nsw i64 %offset.1, 3
+  %ec = icmp eq i64 %i.inc, 100
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret void
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/Analysis/DependenceAnalysis/strong-siv-overflow.ll b/llvm/test/Analysis/DependenceAnalysis/strong-siv-overflow.ll
new file mode 100644
index 0000000000000..bf0fafcbfd6c9
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/strong-siv-overflow.ll
@@ -0,0 +1,68 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output "-passes=print<da>" 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-ALL
+; RUN: opt < %s -disable-output "-passes=print<da>" -da-enable-dependence-test=strong-siv 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-STRONG-SIV
+
+; for (i = 0; i < (1LL << 62); i++) {
+;   if (0 <= 2*i - 2)
+;     A[2*i - 2] = 1;
+;
+;   if (0 <= 2*i - 4)
+;     A[2*i - 4] = 2;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between the two
+; stores, but it does exist. For example, each store will access A[0] when i
+; is 1 and 2 respectively.
+; The root cause is that the product of the BTC and the coefficient
+; ((1LL << 62) - 1 and 2) overflows in a signed sense.
+define void @strongsiv_const_ovfl(ptr %A) {
+; CHECK-LABEL: 'strongsiv_const_ovfl'
+; CHECK-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-NEXT:    da analyze - none!
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %offset.0 = phi i64 [ -2, %entry ], [ %offset.0.next, %loop.latch ]
+  %offset.1 = phi i64 [ -4, %entry ], [ %offset.1.next, %loop.latch ]
+  %ec = icmp eq i64 %i, 4611686018427387904
+  br i1 %ec, label %exit, label %loop.body
+
+loop.body:
+  %cond.0 = icmp sge i64 %offset.0, 0
+  %cond.1 = icmp sge i64 %offset.1, 0
+  br i1 %cond.0, label %if.then.0, label %loop.middle
+
+if.then.0:
+  %gep.0 = getelementptr inbounds i8, ptr %A, i64 %offset.0
+  store i8 1, ptr %gep.0
+  br label %loop.middle
+
+loop.middle:
+  br i1 %cond.1, label %if.then.1, label %loop.latch
+
+if.then.1:
+  %gep.1 = getelementptr inbounds i8, ptr %A, i64 %offset.1
+  store i8 2, ptr %gep.1
+  br label %loop.latch
+
+loop.latch:
+  %i.inc = add nuw nsw i64 %i, 1
+  %offset.0.next = add nsw i64 %offset.0, 2
+  %offset.1.next = add nsw i64 %offset.1, 2
+  br label %loop.header
+
+exit:
+  ret void
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK-ALL: {{.*}}
+; CHECK-STRONG-SIV: {{.*}}
diff --git a/llvm/test/Analysis/DependenceAnalysis/symbolic-rdiv-overflow.ll b/llvm/test/Analysis/DependenceAnalysis/symbolic-rdiv-overflow.ll
new file mode 100644
index 0000000000000..c5ff9884a0c62
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/symbolic-rdiv-overflow.ll
@@ -0,0 +1,137 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output "-passes=print<da>" 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-ALL
+; RUN: opt < %s -disable-output "-passes=print<da>" -da-enable-dependence-test=symbolic-rdiv 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-SYMBOLIC-RDIV
+
+; for (i = 0; i < (1LL << 62); i++) {
+;   if (0 <= 2*i - 2)
+;     A[2*i - 2] = 1;
+;   A[i] = 2;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between the two
+; stores, but it does exist. For example, each store will access A[0] when i
+; is 1 and 0 respectively.
+; The root cause is that the product of the BTC and the coefficient 
+; ((1LL << 62) - 1 and 2) overflows in a signed sense.
+define void @symbolicrdiv_prod_ovfl(ptr %A) {
+; CHECK-ALL-LABEL: 'symbolicrdiv_prod_ovfl'
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+;
+; CHECK-SYMBOLIC-RDIV-LABEL: 'symbolicrdiv_prod_ovfl'
+; CHECK-SYMBOLIC-RDIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-SYMBOLIC-RDIV-NEXT:    da analyze - none!
+; CHECK-SYMBOLIC-RDIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-SYMBOLIC-RDIV-NEXT:    da analyze - none!
+; CHECK-SYMBOLIC-RDIV-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-SYMBOLIC-RDIV-NEXT:    da analyze - consistent output [*]!
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %offset = phi i64 [ -2, %entry ], [ %offset.next, %loop.latch ]
+  %ec = icmp eq i64 %i, 4611686018427387904
+  br i1 %ec, label %exit, label %loop.body
+
+loop.body:
+  %cond = icmp sge i64 %offset, 0
+  br i1 %cond, label %if.then, label %loop.latch
+
+if.then:
+  %gep.0 = getelementptr inbounds i8, ptr %A, i64 %offset
+  store i8 1, ptr %gep.0
+  br label %loop.latch
+
+loop.latch:
+  %gep.1 = getelementptr inbounds i8, ptr %A, i64 %i
+  store i8 2, ptr %gep.1
+  %i.inc = add nuw nsw i64 %i, 1
+  %offset.next = add nsw i64 %offset, 2
+  br label %loop.header
+
+exit:
+  ret void
+}
+
+; offset0 = -4611686018427387904;  // -2^62
+; offset1 =  4611686018427387904;  // 2^62
+; for (i = 0; i < (1LL << 62) - 100; i++) {
+;   if (0 <= offset0)
+;     A[offset0] = 1;
+;   if (0 <= offset1)
+;     A[offset1] = 2;
+;   offset0 += 2;
+;   offset1 -= 1;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between the two
+; stores, but it does exist. For example,
+;
+;  memory access           | i == 2^61 | i == 2^61 + 2^59 | i == 2^61 + 2^60  
+; -------------------------|-----------|------------------|-------------------
+;  A[2*i - 2^62] (offset0) |           | A[2^60]          | A[2^61]           
+;  A[-i + 2^62]  (offset1) | A[2^61]   |                  | A[2^60]           
+;
+; The root cause is that the calculation of the differenct between the two
+; constants (-2^62 and 2^62) overflows in a signed sense.
+define void @symbolicrdiv_delta_ovfl(ptr %A) {
+; CHECK-ALL-LABEL: 'symbolicrdiv_delta_ovfl'
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+;
+; CHECK-SYMBOLIC-RDIV-LABEL: 'symbolicrdiv_delta_ovfl'
+; CHECK-SYMBOLIC-RDIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-SYMBOLIC-RDIV-NEXT:    da analyze - consistent output [*]!
+; CHECK-SYMBOLIC-RDIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-SYMBOLIC-RDIV-NEXT:    da analyze - none!
+; CHECK-SYMBOLIC-RDIV-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-SYMBOLIC-RDIV-NEXT:    da analyze - consistent output [*]!
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %offset.0 = phi i64 [ -4611686018427387904, %entry ], [ %offset.0.next, %loop.latch ]
+  %offset.1 = phi i64 [ 4611686018427387904, %entry ], [ %offset.1.next, %loop.latch ]
+  %cond.0 = icmp sge i64 %offset.0, 0
+  %cond.1 = icmp sge i64 %offset.1, 0
+  br i1 %cond.0, label %if.then.0, label %loop.middle
+
+if.then.0:
+  %gep.0 = getelementptr inbounds i8, ptr %A, i64 %offset.0
+  store i8 1, ptr %gep.0
+  br label %loop.middle
+
+loop.middle:
+  br i1 %cond.1, label %if.then.1, label %loop.latch
+
+if.then.1:
+  %gep.1 = getelementptr inbounds i8, ptr %A, i64 %offset.1
+  store i8 2, ptr %gep.1
+  br label %loop.latch
+
+loop.latch:
+  %i.inc = add nuw nsw i64 %i, 1
+  %offset.0.next = add nsw i64 %offset.0, 2
+  %offset.1.next = sub nsw i64 %offset.1, 1
+  %ec = icmp eq i64 %i.inc, 4611686018427387804 ; 2^62 - 100
+  br i1 %ec, label %exit, label %loop.header
+
+exit:
+  ret void
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/Analysis/DependenceAnalysis/weak-crossing-siv-overflow.ll b/llvm/test/Analysis/DependenceAnalysis/weak-crossing-siv-overflow.ll
new file mode 100644
index 0000000000000..ba57c7bf5736a
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/weak-crossing-siv-overflow.ll
@@ -0,0 +1,125 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output "-passes=print<da>" 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-ALL
+; RUN: opt < %s -disable-output "-passes=print<da>" -da-enable-dependence-test=weak-crossing-siv 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-WEAK-CROSSING-SIV
+
+; max_i = INT64_MAX/3  // 3074457345618258602
+; for (long long i = 0; i <= max_i; i++) {
+;   A[-3*i + INT64_MAX] = 0;
+;   if (i)
+;     A[3*i - 2] = 1;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between
+; `A[-3*i + INT64_MAX]` and `A[3*i - 2]`, but it does exist. For example,
+;
+;  memory access       | i == 1           | i == max_i
+; ---------------------|------------------|------------------
+;  A[-3*i + INT64_MAX] | A[INT64_MAX - 3] | A[1]
+;  A[3*i - 2]          | A[1]             | A[INT64_MAX - 3]
+;
+; The root cause is that the calculation of the differenct between the two
+; constants (INT64_MAX and -2) triggers an overflow.
+
+define void @weakcorssing_delta_ovfl(ptr %A) {
+; CHECK-ALL-LABEL: 'weakcorssing_delta_ovfl'
+; CHECK-ALL-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+;
+; CHECK-WEAK-CROSSING-SIV-LABEL: 'weakcorssing_delta_ovfl'
+; CHECK-WEAK-CROSSING-SIV-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1
+; CHECK-WEAK-CROSSING-SIV-NEXT:    da analyze - consistent output [*]!
+; CHECK-WEAK-CROSSING-SIV-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-WEAK-CROSSING-SIV-NEXT:    da analyze - none!
+; CHECK-WEAK-CROSSING-SIV-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-WEAK-CROSSING-SIV-NEXT:    da analyze - consistent output [*]!
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %subscript.0 = phi i64 [ 9223372036854775807, %entry ], [ %subscript.0.next, %loop.latch ]
+  %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ]
+  %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0
+  store i8 0, ptr %idx.0
+  %cond.store = icmp ne i64 %i, 0
+  br i1 %cond.store, label %if.store, label %loop.latch
+
+if.store:
+  %idx.1 = getelementptr inbounds i8, ptr %A, i64 %subscript.1
+  store i8 1, ptr %idx.1
+  br label %loop.latch
+
+loop.latch:
+  %i.inc = add nuw nsw i64 %i, 1
+  %subscript.0.next = add nsw i64 %subscript.0, -3
+  %subscript.1.next = add nsw i64 %subscript.1, 3
+  %ec = icmp sgt i64 %i.inc, 3074457345618258602
+  br i1 %ec, label %exit, label %loop.header
+
+exit:
+  ret void
+}
+
+; max_i = INT64_MAX/3  // 3074457345618258602
+; for (long long i = 0; i <= max_i; i++) {
+;   A[-3*i + INT64_MAX] = 0;
+;   A[3*i + 1] = 1;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between
+; `A[-3*i + INT64_MAX]` and `A[3*i - 2]`, but it does exist. For example,
+;
+;  memory access       | i == 0 | i == 1           | i == max_i - 1 | i == max_i
+; ---------------------|--------|------------------|----------------|------------------
+;  A[-3*i + INT64_MAX] |        | A[INT64_MAX - 3] | A[1]           |
+;  A[3*i + 1]          | A[1]   |                  |                | A[INT64_MAX - 3]
+;
+; The root cause is that the product of the BTC, the coefficient, and 2
+; triggers an overflow.
+;
+define void @weakcorssing_prod_ovfl(ptr %A) {
+; CHECK-ALL-LABEL: 'weakcorssing_prod_ovfl'
+; CHECK-ALL-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+;
+; CHECK-WEAK-CROSSING-SIV-LABEL: 'weakcorssing_prod_ovfl'
+; CHECK-WEAK-CROSSING-SIV-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1
+; CHECK-WEAK-CROSSING-SIV-NEXT:    da analyze - consistent output [*]!
+; CHECK-WEAK-CROSSING-SIV-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-WEAK-CROSSING-SIV-NEXT:    da analyze - none!
+; CHECK-WEAK-CROSSING-SIV-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1
+; CHECK-WEAK-CROSSING-SIV-NEXT:    da analyze - consistent output [*]!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %subscript.0 = phi i64 [ 9223372036854775807, %entry ], [ %subscript.0.next, %loop ]
+  %subscript.1 = phi i64 [ 1, %entry ], [ %subscript.1.next, %loop ]
+  %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0
+  %idx.1 = getelementptr inbounds i8, ptr %A, i64 %subscript.1
+  store i8 0, ptr %idx.0
+  store i8 1, ptr %idx.1
+  %i.inc = add nuw nsw i64 %i, 1
+  %subscript.0.next = add nsw i64 %subscript.0, -3
+  %subscript.1.next = add nsw i64 %subscript.1, 3
+  %ec = icmp sgt i64 %i.inc, 3074457345618258602
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret void
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-overflow.ll b/llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-overflow.ll
new file mode 100644
index 0000000000000..6317c387858d3
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-overflow.ll
@@ -0,0 +1,122 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output "-passes=print<da>" 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-ALL
+; RUN: opt < %s -disable-output "-passes=print<da>" -da-enable-dependence-test=weak-zero-siv 2>&1 \
+; RUN:     | FileCheck %s --check-prefixes=CHECK,CHECK-WEAK-ZERO-SIV
+
+; for (i = 0; i < (1LL << 62); i++) {
+;   if (0 <= 2*i - 2)
+;     A[2*i - 2] = 1;
+;   A[2] = 2;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between the two
+; stores, but it does exist. The root cause is that the product of the BTC and
+; the coefficient ((1LL << 62) - 1 and 2) overflows in a signed sense.
+;
+define void @weakzero_dst_siv_prod_ovfl(ptr %A) {
+; CHECK-ALL-LABEL: 'weakzero_dst_siv_prod_ovfl'
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - consistent output [S]!
+;
+; CHECK-WEAK-ZERO-SIV-LABEL: 'weakzero_dst_siv_prod_ovfl'
+; CHECK-WEAK-ZERO-SIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-WEAK-ZERO-SIV-NEXT:    da analyze - consistent output [*]!
+; CHECK-WEAK-ZERO-SIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-WEAK-ZERO-SIV-NEXT:    da analyze - none!
+; CHECK-WEAK-ZERO-SIV-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-WEAK-ZERO-SIV-NEXT:    da analyze - consistent output [S]!
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %offset = phi i64 [ -2, %entry ], [ %offset.next, %loop.latch ]
+  %ec = icmp eq i64 %i, 4611686018427387904
+  br i1 %ec, label %exit, label %loop.body
+
+loop.body:
+  %cond = icmp sge i64 %offset, 0
+  br i1 %cond, label %if.then, label %loop.latch
+
+if.then:
+  %gep.0 = getelementptr inbounds i8, ptr %A, i64 %offset
+  store i8 1, ptr %gep.0
+  br label %loop.latch
+
+loop.latch:
+  %gep.1 = getelementptr inbounds i8, ptr %A, i64 2
+  store i8 2, ptr %gep.1
+  %i.inc = add nuw nsw i64 %i, 1
+  %offset.next = add nsw i64 %offset, 2
+  br label %loop.header
+
+exit:
+  ret void
+}
+
+; for (i = 0; i < n; i++) {
+;   if (0 <= 2*i - 1)
+;     A[2*i - 1] = 1;
+;   A[INT64_MAX] = 2;
+; }
+;
+; FIXME: DependenceAnalysis currently detects no dependency between the two
+; stores, but it does exist. When `%n` is 2^62, the value of `%offset` will be
+; the same as INT64_MAX at the last iteration.
+; The root cause is that the calculation of the difference between the two
+; constants (INT64_MAX and -1) overflows in a signed sense.
+;
+define void @weakzero_dst_siv_delta_ovfl(ptr %A, i64 %n) {
+; CHECK-ALL-LABEL: 'weakzero_dst_siv_delta_ovfl'
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - none!
+; CHECK-ALL-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-ALL-NEXT:    da analyze - consistent output [S]!
+;
+; CHECK-WEAK-ZERO-SIV-LABEL: 'weakzero_dst_siv_delta_ovfl'
+; CHECK-WEAK-ZERO-SIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 1, ptr %gep.0, align 1
+; CHECK-WEAK-ZERO-SIV-NEXT:    da analyze - consistent output [*]!
+; CHECK-WEAK-ZERO-SIV-NEXT:  Src: store i8 1, ptr %gep.0, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-WEAK-ZERO-SIV-NEXT:    da analyze - none!
+; CHECK-WEAK-ZERO-SIV-NEXT:  Src: store i8 2, ptr %gep.1, align 1 --> Dst: store i8 2, ptr %gep.1, align 1
+; CHECK-WEAK-ZERO-SIV-NEXT:    da analyze - consistent output [S]!
+;
+entry:
+  %guard = icmp sgt i64 %n, 0
+  br i1 %guard, label %loop.header, label %exit
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %offset = phi i64 [ -2, %entry ], [ %offset.next, %loop.latch ]
+  %ec = icmp eq i64 %i, %n
+  br i1 %ec, label %exit, label %loop.body
+
+loop.body:
+  %cond = icmp sge i64 %offset, 0
+  br i1 %cond, label %if.then, label %loop.latch
+
+if.then:
+  %gep.0 = getelementptr inbounds i8, ptr %A, i64 %offset
+  store i8 1, ptr %gep.0
+  br label %loop.latch
+
+loop.latch:
+  %gep.1 = getelementptr inbounds i8, ptr %A, i64 9223372036854775807
+  store i8 2, ptr %gep.1
+  %i.inc = add nuw nsw i64 %i, 1
+  %offset.next = add nsw i64 %offset, 2
+  br label %loop.header
+
+exit:
+  ret void
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/CodeGen/AArch64/arm64-srl-and.ll b/llvm/test/CodeGen/AArch64/arm64-srl-and.ll
index b58f6ba96a5b8..330f27bd6c0cd 100644
--- a/llvm/test/CodeGen/AArch64/arm64-srl-and.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-srl-and.ll
@@ -1,22 +1,38 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=aarch64-linux-gnu -O3 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -O3 < %s | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64-linux-gnu -O3 -global-isel < %s | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
 ; This used to miscompile:
 ; The 16-bit -1 should not become 32-bit -1 (sub w8, w8, #1).
 
 @g = global i16 0, align 4
 define i32 @srl_and()  {
-; CHECK-LABEL: srl_and:
-; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    adrp x8, :got:g
-; CHECK-NEXT:    mov w9, #50
-; CHECK-NEXT:    ldr x8, [x8, :got_lo12:g]
-; CHECK-NEXT:    ldrh w8, [x8]
-; CHECK-NEXT:    eor w8, w8, w9
-; CHECK-NEXT:    mov w9, #65535
-; CHECK-NEXT:    add w8, w8, w9
-; CHECK-NEXT:    and w0, w8, w8, lsr #16
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: srl_and:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    adrp x8, :got:g
+; CHECK-SD-NEXT:    mov w9, #50 // =0x32
+; CHECK-SD-NEXT:    ldr x8, [x8, :got_lo12:g]
+; CHECK-SD-NEXT:    ldrh w8, [x8]
+; CHECK-SD-NEXT:    eor w8, w8, w9
+; CHECK-SD-NEXT:    mov w9, #65535 // =0xffff
+; CHECK-SD-NEXT:    add w8, w8, w9
+; CHECK-SD-NEXT:    and w0, w8, w8, lsr #16
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: srl_and:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    adrp x8, :got:g
+; CHECK-GI-NEXT:    mov w9, #50 // =0x32
+; CHECK-GI-NEXT:    ldr x8, [x8, :got_lo12:g]
+; CHECK-GI-NEXT:    ldrh w8, [x8]
+; CHECK-GI-NEXT:    eor w8, w8, w9
+; CHECK-GI-NEXT:    mov w9, #65535 // =0xffff
+; CHECK-GI-NEXT:    add w8, w9, w8, uxth
+; CHECK-GI-NEXT:    and w9, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, w9
+; CHECK-GI-NEXT:    cset w8, ne
+; CHECK-GI-NEXT:    and w0, w9, w8
+; CHECK-GI-NEXT:    ret
 entry:
   %0 = load i16, ptr @g, align 4
   %1 = xor i16 %0, 50
@@ -29,3 +45,5 @@ entry:
 
   ret i32 %and
 }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll b/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
index c3fdc7db2abbe..8438f0b03179c 100644
--- a/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
+++ b/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=aarch64-unknown-unknown < %s | FileCheck %s --check-prefix=CHECK
+; RUN: llc -mtriple=aarch64-unknown-unknown < %s | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64-unknown-unknown -global-isel < %s | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
 ; We are looking for the following pattern here:
 ;   (X & (C l>> Y)) ==/!= 0
@@ -13,12 +14,21 @@
 ; i8 scalar
 
 define i1 @scalar_i8_signbit_eq(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x80
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x80
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #128 // =0x80
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i8 128, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 0
@@ -26,12 +36,21 @@ define i1 @scalar_i8_signbit_eq(i8 %x, i8 %y) nounwind {
 }
 
 define i1 @scalar_i8_lowestbit_eq(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i8 1, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 0
@@ -39,12 +58,21 @@ define i1 @scalar_i8_lowestbit_eq(i8 %x, i8 %y) nounwind {
 }
 
 define i1 @scalar_i8_bitsinmiddle_eq(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x18
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x18
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #24 // =0x18
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i8 24, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 0
@@ -54,12 +82,21 @@ define i1 @scalar_i8_bitsinmiddle_eq(i8 %x, i8 %y) nounwind {
 ; i16 scalar
 
 define i1 @scalar_i16_signbit_eq(i16 %x, i16 %y) nounwind {
-; CHECK-LABEL: scalar_i16_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x8000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i16_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x8000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i16_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #32768 // =0x8000
+; CHECK-GI-NEXT:    and w9, w1, #0xffff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i16 32768, %y
   %t1 = and i16 %t0, %x
   %res = icmp eq i16 %t1, 0
@@ -67,12 +104,21 @@ define i1 @scalar_i16_signbit_eq(i16 %x, i16 %y) nounwind {
 }
 
 define i1 @scalar_i16_lowestbit_eq(i16 %x, i16 %y) nounwind {
-; CHECK-LABEL: scalar_i16_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i16_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i16_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    and w9, w1, #0xffff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i16 1, %y
   %t1 = and i16 %t0, %x
   %res = icmp eq i16 %t1, 0
@@ -80,12 +126,21 @@ define i1 @scalar_i16_lowestbit_eq(i16 %x, i16 %y) nounwind {
 }
 
 define i1 @scalar_i16_bitsinmiddle_eq(i16 %x, i16 %y) nounwind {
-; CHECK-LABEL: scalar_i16_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0xff0
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i16_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0xff0
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i16_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #4080 // =0xff0
+; CHECK-GI-NEXT:    and w9, w1, #0xffff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i16 4080, %y
   %t1 = and i16 %t0, %x
   %res = icmp eq i16 %t1, 0
@@ -95,12 +150,20 @@ define i1 @scalar_i16_bitsinmiddle_eq(i16 %x, i16 %y) nounwind {
 ; i32 scalar
 
 define i1 @scalar_i32_signbit_eq(i32 %x, i32 %y) nounwind {
-; CHECK-LABEL: scalar_i32_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x80000000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i32_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x80000000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i32_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-GI-NEXT:    lsr w8, w8, w1
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i32 2147483648, %y
   %t1 = and i32 %t0, %x
   %res = icmp eq i32 %t1, 0
@@ -108,12 +171,20 @@ define i1 @scalar_i32_signbit_eq(i32 %x, i32 %y) nounwind {
 }
 
 define i1 @scalar_i32_lowestbit_eq(i32 %x, i32 %y) nounwind {
-; CHECK-LABEL: scalar_i32_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i32_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i32_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    lsr w8, w8, w1
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i32 1, %y
   %t1 = and i32 %t0, %x
   %res = icmp eq i32 %t1, 0
@@ -121,12 +192,20 @@ define i1 @scalar_i32_lowestbit_eq(i32 %x, i32 %y) nounwind {
 }
 
 define i1 @scalar_i32_bitsinmiddle_eq(i32 %x, i32 %y) nounwind {
-; CHECK-LABEL: scalar_i32_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    tst w8, #0xffff00
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i32_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0xffff00
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i32_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #16776960 // =0xffff00
+; CHECK-GI-NEXT:    lsr w8, w8, w1
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i32 16776960, %y
   %t1 = and i32 %t0, %x
   %res = icmp eq i32 %t1, 0
@@ -136,12 +215,20 @@ define i1 @scalar_i32_bitsinmiddle_eq(i32 %x, i32 %y) nounwind {
 ; i64 scalar
 
 define i1 @scalar_i64_signbit_eq(i64 %x, i64 %y) nounwind {
-; CHECK-LABEL: scalar_i64_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl x8, x0, x1
-; CHECK-NEXT:    tst x8, #0x8000000000000000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i64_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl x8, x0, x1
+; CHECK-SD-NEXT:    tst x8, #0x8000000000000000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i64_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov x8, #-9223372036854775808 // =0x8000000000000000
+; CHECK-GI-NEXT:    lsr x8, x8, x1
+; CHECK-GI-NEXT:    tst x8, x0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i64 9223372036854775808, %y
   %t1 = and i64 %t0, %x
   %res = icmp eq i64 %t1, 0
@@ -149,12 +236,20 @@ define i1 @scalar_i64_signbit_eq(i64 %x, i64 %y) nounwind {
 }
 
 define i1 @scalar_i64_lowestbit_eq(i64 %x, i64 %y) nounwind {
-; CHECK-LABEL: scalar_i64_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl x8, x0, x1
-; CHECK-NEXT:    tst x8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i64_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl x8, x0, x1
+; CHECK-SD-NEXT:    tst x8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i64_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    lsr x8, x8, x1
+; CHECK-GI-NEXT:    tst x8, x0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i64 1, %y
   %t1 = and i64 %t0, %x
   %res = icmp eq i64 %t1, 0
@@ -162,12 +257,20 @@ define i1 @scalar_i64_lowestbit_eq(i64 %x, i64 %y) nounwind {
 }
 
 define i1 @scalar_i64_bitsinmiddle_eq(i64 %x, i64 %y) nounwind {
-; CHECK-LABEL: scalar_i64_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl x8, x0, x1
-; CHECK-NEXT:    tst x8, #0xffffffff0000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i64_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl x8, x0, x1
+; CHECK-SD-NEXT:    tst x8, #0xffffffff0000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i64_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov x8, #281474976645120 // =0xffffffff0000
+; CHECK-GI-NEXT:    lsr x8, x8, x1
+; CHECK-GI-NEXT:    tst x8, x0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i64 281474976645120, %y
   %t1 = and i64 %t0, %x
   %res = icmp eq i64 %t1, 0
@@ -179,14 +282,24 @@ define i1 @scalar_i64_bitsinmiddle_eq(i64 %x, i64 %y) nounwind {
 ;------------------------------------------------------------------------------;
 
 define <4 x i1> @vec_4xi32_splat_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
-; CHECK-LABEL: vec_4xi32_splat_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    movi v2.4s, #1
-; CHECK-NEXT:    ushl v0.4s, v0.4s, v1.4s
-; CHECK-NEXT:    and v0.16b, v0.16b, v2.16b
-; CHECK-NEXT:    cmeq v0.4s, v0.4s, #0
-; CHECK-NEXT:    xtn v0.4h, v0.4s
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: vec_4xi32_splat_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v2.4s, #1
+; CHECK-SD-NEXT:    ushl v0.4s, v0.4s, v1.4s
+; CHECK-SD-NEXT:    and v0.16b, v0.16b, v2.16b
+; CHECK-SD-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-SD-NEXT:    xtn v0.4h, v0.4s
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: vec_4xi32_splat_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    movi v2.4s, #1
+; CHECK-GI-NEXT:    neg v1.4s, v1.4s
+; CHECK-GI-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-GI-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
+; CHECK-GI-NEXT:    ret
   %t0 = lshr <4 x i32> <i32 1, i32 1, i32 1, i32 1>, %y
   %t1 = and <4 x i32> %t0, %x
   %res = icmp eq <4 x i32> %t1, <i32 0, i32 0, i32 0, i32 0>
@@ -211,44 +324,86 @@ define <4 x i1> @vec_4xi32_nonsplat_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
 }
 
 define <4 x i1> @vec_4xi32_nonsplat_undef0_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
-; CHECK-LABEL: vec_4xi32_nonsplat_undef0_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    movi v2.4s, #1
-; CHECK-NEXT:    ushl v0.4s, v0.4s, v1.4s
-; CHECK-NEXT:    and v0.16b, v0.16b, v2.16b
-; CHECK-NEXT:    cmeq v0.4s, v0.4s, #0
-; CHECK-NEXT:    xtn v0.4h, v0.4s
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: vec_4xi32_nonsplat_undef0_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v2.4s, #1
+; CHECK-SD-NEXT:    ushl v0.4s, v0.4s, v1.4s
+; CHECK-SD-NEXT:    and v0.16b, v0.16b, v2.16b
+; CHECK-SD-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-SD-NEXT:    xtn v0.4h, v0.4s
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: vec_4xi32_nonsplat_undef0_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    neg v1.4s, v1.4s
+; CHECK-GI-NEXT:    fmov s2, w8
+; CHECK-GI-NEXT:    mov v2.s[1], w8
+; CHECK-GI-NEXT:    mov v2.s[3], w8
+; CHECK-GI-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-GI-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
+; CHECK-GI-NEXT:    ret
   %t0 = lshr <4 x i32> <i32 1, i32 1, i32 undef, i32 1>, %y
   %t1 = and <4 x i32> %t0, %x
   %res = icmp eq <4 x i32> %t1, <i32 0, i32 0, i32 0, i32 0>
   ret <4 x i1> %res
 }
 define <4 x i1> @vec_4xi32_nonsplat_undef1_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
-; CHECK-LABEL: vec_4xi32_nonsplat_undef1_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    movi v2.4s, #1
-; CHECK-NEXT:    neg v1.4s, v1.4s
-; CHECK-NEXT:    ushl v1.4s, v2.4s, v1.4s
-; CHECK-NEXT:    and v0.16b, v1.16b, v0.16b
-; CHECK-NEXT:    cmeq v0.4s, v0.4s, #0
-; CHECK-NEXT:    xtn v0.4h, v0.4s
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: vec_4xi32_nonsplat_undef1_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v2.4s, #1
+; CHECK-SD-NEXT:    neg v1.4s, v1.4s
+; CHECK-SD-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-SD-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-SD-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-SD-NEXT:    xtn v0.4h, v0.4s
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: vec_4xi32_nonsplat_undef1_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    movi d2, #0000000000000000
+; CHECK-GI-NEXT:    movi v3.4s, #1
+; CHECK-GI-NEXT:    neg v1.4s, v1.4s
+; CHECK-GI-NEXT:    mov v2.s[1], wzr
+; CHECK-GI-NEXT:    ushl v1.4s, v3.4s, v1.4s
+; CHECK-GI-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    mov v2.s[3], wzr
+; CHECK-GI-NEXT:    cmeq v0.4s, v0.4s, v2.4s
+; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
+; CHECK-GI-NEXT:    ret
   %t0 = lshr <4 x i32> <i32 1, i32 1, i32 1, i32 1>, %y
   %t1 = and <4 x i32> %t0, %x
   %res = icmp eq <4 x i32> %t1, <i32 0, i32 0, i32 undef, i32 0>
   ret <4 x i1> %res
 }
 define <4 x i1> @vec_4xi32_nonsplat_undef2_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
-; CHECK-LABEL: vec_4xi32_nonsplat_undef2_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    movi v2.4s, #1
-; CHECK-NEXT:    neg v1.4s, v1.4s
-; CHECK-NEXT:    ushl v1.4s, v2.4s, v1.4s
-; CHECK-NEXT:    and v0.16b, v1.16b, v0.16b
-; CHECK-NEXT:    cmeq v0.4s, v0.4s, #0
-; CHECK-NEXT:    xtn v0.4h, v0.4s
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: vec_4xi32_nonsplat_undef2_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v2.4s, #1
+; CHECK-SD-NEXT:    neg v1.4s, v1.4s
+; CHECK-SD-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-SD-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-SD-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-SD-NEXT:    xtn v0.4h, v0.4s
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: vec_4xi32_nonsplat_undef2_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    movi d2, #0000000000000000
+; CHECK-GI-NEXT:    neg v1.4s, v1.4s
+; CHECK-GI-NEXT:    fmov s3, w8
+; CHECK-GI-NEXT:    mov v3.s[1], w8
+; CHECK-GI-NEXT:    mov v2.s[1], wzr
+; CHECK-GI-NEXT:    mov v3.s[3], w8
+; CHECK-GI-NEXT:    mov v2.s[3], wzr
+; CHECK-GI-NEXT:    ushl v1.4s, v3.4s, v1.4s
+; CHECK-GI-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    cmeq v0.4s, v0.4s, v2.4s
+; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
+; CHECK-GI-NEXT:    ret
   %t0 = lshr <4 x i32> <i32 1, i32 1, i32 undef, i32 1>, %y
   %t1 = and <4 x i32> %t0, %x
   %res = icmp eq <4 x i32> %t1, <i32 0, i32 0, i32 undef, i32 0>
@@ -260,11 +415,20 @@ define <4 x i1> @vec_4xi32_nonsplat_undef2_eq(<4 x i32> %x, <4 x i32> %y) nounwi
 ;------------------------------------------------------------------------------;
 
 define i1 @scalar_i8_signbit_ne(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_signbit_ne:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsl w8, w0, w1
-; CHECK-NEXT:    ubfx w0, w8, #7, #1
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_signbit_ne:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsl w8, w0, w1
+; CHECK-SD-NEXT:    ubfx w0, w8, #7, #1
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_signbit_ne:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #128 // =0x80
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, ne
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i8 128, %y
   %t1 = and i8 %t0, %x
   %res = icmp ne i8 %t1, 0 ;  we are perfectly happy with 'ne' predicate
@@ -315,14 +479,24 @@ define i1 @scalar_i8_bitsinmiddle_slt(i8 %x, i8 %y) nounwind {
 }
 
 define i1 @scalar_i8_signbit_eq_with_nonzero(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_signbit_eq_with_nonzero:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #128 // =0x80
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    and w8, w8, w0
-; CHECK-NEXT:    cmp w8, #1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_signbit_eq_with_nonzero:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    mov w8, #128 // =0x80
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    and w8, w8, w0
+; CHECK-SD-NEXT:    cmp w8, #1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_signbit_eq_with_nonzero:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #128 // =0x80
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsr w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    cmp w8, #1
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = lshr i8 128, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 1 ; should be comparing with 0
diff --git a/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-shl-in-eqcmp-zero.ll b/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
index 4a73b10811d29..cc1bf27b8d4b7 100644
--- a/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
+++ b/llvm/test/CodeGen/AArch64/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=aarch64-unknown-unknown < %s | FileCheck %s --check-prefix=CHECK
+; RUN: llc -mtriple=aarch64-unknown-unknown < %s | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64-unknown-unknown -global-isel < %s | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
 ; We are looking for the following pattern here:
 ;   (X & (C << Y)) ==/!= 0
@@ -13,13 +14,23 @@
 ; i8 scalar
 
 define i1 @scalar_i8_signbit_eq(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xff
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    tst w8, #0x80
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xff
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    tst w8, #0x80
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-128 // =0xffffff80
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    tst w8, #0xff
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i8 128, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 0
@@ -27,13 +38,23 @@ define i1 @scalar_i8_signbit_eq(i8 %x, i8 %y) nounwind {
 }
 
 define i1 @scalar_i8_lowestbit_eq(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xff
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    tst w8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xff
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    tst w8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    tst w8, #0xff
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i8 1, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 0
@@ -41,13 +62,23 @@ define i1 @scalar_i8_lowestbit_eq(i8 %x, i8 %y) nounwind {
 }
 
 define i1 @scalar_i8_bitsinmiddle_eq(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xff
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    tst w8, #0x18
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xff
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    tst w8, #0x18
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #24 // =0x18
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    tst w8, #0xff
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i8 24, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 0
@@ -57,13 +88,23 @@ define i1 @scalar_i8_bitsinmiddle_eq(i8 %x, i8 %y) nounwind {
 ; i16 scalar
 
 define i1 @scalar_i16_signbit_eq(i16 %x, i16 %y) nounwind {
-; CHECK-LABEL: scalar_i16_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    tst w8, #0x8000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i16_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xffff
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    tst w8, #0x8000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i16_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-32768 // =0xffff8000
+; CHECK-GI-NEXT:    and w9, w1, #0xffff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    tst w8, #0xffff
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i16 32768, %y
   %t1 = and i16 %t0, %x
   %res = icmp eq i16 %t1, 0
@@ -71,13 +112,23 @@ define i1 @scalar_i16_signbit_eq(i16 %x, i16 %y) nounwind {
 }
 
 define i1 @scalar_i16_lowestbit_eq(i16 %x, i16 %y) nounwind {
-; CHECK-LABEL: scalar_i16_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    tst w8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i16_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xffff
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    tst w8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i16_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    and w9, w1, #0xffff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    tst w8, #0xffff
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i16 1, %y
   %t1 = and i16 %t0, %x
   %res = icmp eq i16 %t1, 0
@@ -85,13 +136,23 @@ define i1 @scalar_i16_lowestbit_eq(i16 %x, i16 %y) nounwind {
 }
 
 define i1 @scalar_i16_bitsinmiddle_eq(i16 %x, i16 %y) nounwind {
-; CHECK-LABEL: scalar_i16_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    tst w8, #0xff0
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i16_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xffff
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    tst w8, #0xff0
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i16_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #4080 // =0xff0
+; CHECK-GI-NEXT:    and w9, w1, #0xffff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    tst w8, #0xffff
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i16 4080, %y
   %t1 = and i16 %t0, %x
   %res = icmp eq i16 %t1, 0
@@ -101,12 +162,20 @@ define i1 @scalar_i16_bitsinmiddle_eq(i16 %x, i16 %y) nounwind {
 ; i32 scalar
 
 define i1 @scalar_i32_signbit_eq(i32 %x, i32 %y) nounwind {
-; CHECK-LABEL: scalar_i32_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsr w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x80000000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i32_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsr w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x80000000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i32_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-GI-NEXT:    lsl w8, w8, w1
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i32 2147483648, %y
   %t1 = and i32 %t0, %x
   %res = icmp eq i32 %t1, 0
@@ -114,12 +183,20 @@ define i1 @scalar_i32_signbit_eq(i32 %x, i32 %y) nounwind {
 }
 
 define i1 @scalar_i32_lowestbit_eq(i32 %x, i32 %y) nounwind {
-; CHECK-LABEL: scalar_i32_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsr w8, w0, w1
-; CHECK-NEXT:    tst w8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i32_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsr w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i32_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    lsl w8, w8, w1
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i32 1, %y
   %t1 = and i32 %t0, %x
   %res = icmp eq i32 %t1, 0
@@ -127,12 +204,20 @@ define i1 @scalar_i32_lowestbit_eq(i32 %x, i32 %y) nounwind {
 }
 
 define i1 @scalar_i32_bitsinmiddle_eq(i32 %x, i32 %y) nounwind {
-; CHECK-LABEL: scalar_i32_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsr w8, w0, w1
-; CHECK-NEXT:    tst w8, #0xffff00
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i32_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsr w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0xffff00
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i32_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #16776960 // =0xffff00
+; CHECK-GI-NEXT:    lsl w8, w8, w1
+; CHECK-GI-NEXT:    tst w8, w0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i32 16776960, %y
   %t1 = and i32 %t0, %x
   %res = icmp eq i32 %t1, 0
@@ -142,12 +227,20 @@ define i1 @scalar_i32_bitsinmiddle_eq(i32 %x, i32 %y) nounwind {
 ; i64 scalar
 
 define i1 @scalar_i64_signbit_eq(i64 %x, i64 %y) nounwind {
-; CHECK-LABEL: scalar_i64_signbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsr x8, x0, x1
-; CHECK-NEXT:    tst x8, #0x8000000000000000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i64_signbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsr x8, x0, x1
+; CHECK-SD-NEXT:    tst x8, #0x8000000000000000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i64_signbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov x8, #-9223372036854775808 // =0x8000000000000000
+; CHECK-GI-NEXT:    lsl x8, x8, x1
+; CHECK-GI-NEXT:    tst x8, x0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i64 9223372036854775808, %y
   %t1 = and i64 %t0, %x
   %res = icmp eq i64 %t1, 0
@@ -155,12 +248,20 @@ define i1 @scalar_i64_signbit_eq(i64 %x, i64 %y) nounwind {
 }
 
 define i1 @scalar_i64_lowestbit_eq(i64 %x, i64 %y) nounwind {
-; CHECK-LABEL: scalar_i64_lowestbit_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsr x8, x0, x1
-; CHECK-NEXT:    tst x8, #0x1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i64_lowestbit_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsr x8, x0, x1
+; CHECK-SD-NEXT:    tst x8, #0x1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i64_lowestbit_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    lsl x8, x8, x1
+; CHECK-GI-NEXT:    tst x8, x0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i64 1, %y
   %t1 = and i64 %t0, %x
   %res = icmp eq i64 %t1, 0
@@ -168,12 +269,20 @@ define i1 @scalar_i64_lowestbit_eq(i64 %x, i64 %y) nounwind {
 }
 
 define i1 @scalar_i64_bitsinmiddle_eq(i64 %x, i64 %y) nounwind {
-; CHECK-LABEL: scalar_i64_bitsinmiddle_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    lsr x8, x0, x1
-; CHECK-NEXT:    tst x8, #0xffffffff0000
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i64_bitsinmiddle_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    lsr x8, x0, x1
+; CHECK-SD-NEXT:    tst x8, #0xffffffff0000
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i64_bitsinmiddle_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov x8, #281474976645120 // =0xffffffff0000
+; CHECK-GI-NEXT:    lsl x8, x8, x1
+; CHECK-GI-NEXT:    tst x8, x0
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %t0 = shl i64 281474976645120, %y
   %t1 = and i64 %t0, %x
   %res = icmp eq i64 %t1, 0
@@ -216,42 +325,81 @@ define <4 x i1> @vec_4xi32_nonsplat_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
 }
 
 define <4 x i1> @vec_4xi32_nonsplat_undef0_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
-; CHECK-LABEL: vec_4xi32_nonsplat_undef0_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    movi v2.4s, #1
-; CHECK-NEXT:    ushl v1.4s, v2.4s, v1.4s
-; CHECK-NEXT:    and v0.16b, v1.16b, v0.16b
-; CHECK-NEXT:    cmeq v0.4s, v0.4s, #0
-; CHECK-NEXT:    xtn v0.4h, v0.4s
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: vec_4xi32_nonsplat_undef0_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v2.4s, #1
+; CHECK-SD-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-SD-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-SD-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-SD-NEXT:    xtn v0.4h, v0.4s
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: vec_4xi32_nonsplat_undef0_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    fmov s2, w8
+; CHECK-GI-NEXT:    mov v2.s[1], w8
+; CHECK-GI-NEXT:    mov v2.s[3], w8
+; CHECK-GI-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-GI-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
+; CHECK-GI-NEXT:    ret
   %t0 = shl <4 x i32> <i32 1, i32 1, i32 undef, i32 1>, %y
   %t1 = and <4 x i32> %t0, %x
   %res = icmp eq <4 x i32> %t1, <i32 0, i32 0, i32 0, i32 0>
   ret <4 x i1> %res
 }
 define <4 x i1> @vec_4xi32_nonsplat_undef1_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
-; CHECK-LABEL: vec_4xi32_nonsplat_undef1_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    movi v2.4s, #1
-; CHECK-NEXT:    ushl v1.4s, v2.4s, v1.4s
-; CHECK-NEXT:    and v0.16b, v1.16b, v0.16b
-; CHECK-NEXT:    cmeq v0.4s, v0.4s, #0
-; CHECK-NEXT:    xtn v0.4h, v0.4s
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: vec_4xi32_nonsplat_undef1_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v2.4s, #1
+; CHECK-SD-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-SD-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-SD-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-SD-NEXT:    xtn v0.4h, v0.4s
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: vec_4xi32_nonsplat_undef1_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    movi d3, #0000000000000000
+; CHECK-GI-NEXT:    movi v2.4s, #1
+; CHECK-GI-NEXT:    mov v3.s[1], wzr
+; CHECK-GI-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-GI-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    mov v3.s[3], wzr
+; CHECK-GI-NEXT:    cmeq v0.4s, v0.4s, v3.4s
+; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
+; CHECK-GI-NEXT:    ret
   %t0 = shl <4 x i32> <i32 1, i32 1, i32 1, i32 1>, %y
   %t1 = and <4 x i32> %t0, %x
   %res = icmp eq <4 x i32> %t1, <i32 0, i32 0, i32 undef, i32 0>
   ret <4 x i1> %res
 }
 define <4 x i1> @vec_4xi32_nonsplat_undef2_eq(<4 x i32> %x, <4 x i32> %y) nounwind {
-; CHECK-LABEL: vec_4xi32_nonsplat_undef2_eq:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    movi v2.4s, #1
-; CHECK-NEXT:    ushl v1.4s, v2.4s, v1.4s
-; CHECK-NEXT:    and v0.16b, v1.16b, v0.16b
-; CHECK-NEXT:    cmeq v0.4s, v0.4s, #0
-; CHECK-NEXT:    xtn v0.4h, v0.4s
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: vec_4xi32_nonsplat_undef2_eq:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v2.4s, #1
+; CHECK-SD-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-SD-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-SD-NEXT:    cmeq v0.4s, v0.4s, #0
+; CHECK-SD-NEXT:    xtn v0.4h, v0.4s
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: vec_4xi32_nonsplat_undef2_eq:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #1 // =0x1
+; CHECK-GI-NEXT:    movi d3, #0000000000000000
+; CHECK-GI-NEXT:    fmov s2, w8
+; CHECK-GI-NEXT:    mov v2.s[1], w8
+; CHECK-GI-NEXT:    mov v3.s[1], wzr
+; CHECK-GI-NEXT:    mov v2.s[3], w8
+; CHECK-GI-NEXT:    mov v3.s[3], wzr
+; CHECK-GI-NEXT:    ushl v1.4s, v2.4s, v1.4s
+; CHECK-GI-NEXT:    and v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    cmeq v0.4s, v0.4s, v3.4s
+; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
+; CHECK-GI-NEXT:    ret
   %t0 = shl <4 x i32> <i32 1, i32 1, i32 undef, i32 1>, %y
   %t1 = and <4 x i32> %t0, %x
   %res = icmp eq <4 x i32> %t1, <i32 0, i32 0, i32 undef, i32 0>
@@ -263,12 +411,22 @@ define <4 x i1> @vec_4xi32_nonsplat_undef2_eq(<4 x i32> %x, <4 x i32> %y) nounwi
 ;------------------------------------------------------------------------------;
 
 define i1 @scalar_i8_signbit_ne(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_signbit_ne:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xff
-; CHECK-NEXT:    lsr w8, w8, w1
-; CHECK-NEXT:    lsr w0, w8, #7
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_signbit_ne:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xff
+; CHECK-SD-NEXT:    lsr w8, w8, w1
+; CHECK-SD-NEXT:    lsr w0, w8, #7
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_signbit_ne:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-128 // =0xffffff80
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    tst w8, #0xff
+; CHECK-GI-NEXT:    cset w0, ne
+; CHECK-GI-NEXT:    ret
   %t0 = shl i8 128, %y
   %t1 = and i8 %t0, %x
   %res = icmp ne i8 %t1, 0 ;  we are perfectly happy with 'ne' predicate
@@ -310,13 +468,24 @@ define i1 @scalar_i32_x_is_const2_eq(i32 %y) nounwind {
 }
 
 define i1 @scalar_i8_bitsinmiddle_slt(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_bitsinmiddle_slt:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #24 // =0x18
-; CHECK-NEXT:    lsl w8, w8, w1
-; CHECK-NEXT:    and w8, w8, w0
-; CHECK-NEXT:    ubfx w0, w8, #7, #1
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_bitsinmiddle_slt:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    mov w8, #24 // =0x18
+; CHECK-SD-NEXT:    lsl w8, w8, w1
+; CHECK-SD-NEXT:    and w8, w8, w0
+; CHECK-SD-NEXT:    ubfx w0, w8, #7, #1
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_bitsinmiddle_slt:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #24 // =0x18
+; CHECK-GI-NEXT:    and w9, w1, #0xff
+; CHECK-GI-NEXT:    lsl w8, w8, w9
+; CHECK-GI-NEXT:    and w8, w8, w0
+; CHECK-GI-NEXT:    sxtb w8, w8
+; CHECK-GI-NEXT:    cmp w8, #0
+; CHECK-GI-NEXT:    cset w0, mi
+; CHECK-GI-NEXT:    ret
   %t0 = shl i8 24, %y
   %t1 = and i8 %t0, %x
   %res = icmp slt i8 %t1, 0
@@ -324,15 +493,20 @@ define i1 @scalar_i8_bitsinmiddle_slt(i8 %x, i8 %y) nounwind {
 }
 
 define i1 @scalar_i8_signbit_eq_with_nonzero(i8 %x, i8 %y) nounwind {
-; CHECK-LABEL: scalar_i8_signbit_eq_with_nonzero:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #-128 // =0xffffff80
-; CHECK-NEXT:    lsl w8, w8, w1
-; CHECK-NEXT:    and w8, w8, w0
-; CHECK-NEXT:    and w8, w8, #0x80
-; CHECK-NEXT:    cmp w8, #1
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: scalar_i8_signbit_eq_with_nonzero:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    mov w8, #-128 // =0xffffff80
+; CHECK-SD-NEXT:    lsl w8, w8, w1
+; CHECK-SD-NEXT:    and w8, w8, w0
+; CHECK-SD-NEXT:    and w8, w8, #0x80
+; CHECK-SD-NEXT:    cmp w8, #1
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: scalar_i8_signbit_eq_with_nonzero:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w0, wzr
+; CHECK-GI-NEXT:    ret
   %t0 = shl i8 128, %y
   %t1 = and i8 %t0, %x
   %res = icmp eq i8 %t1, 1 ; should be comparing with 0
diff --git a/llvm/test/CodeGen/AArch64/signbit-test.ll b/llvm/test/CodeGen/AArch64/signbit-test.ll
index c74a934ee09d8..298495bcf5a01 100644
--- a/llvm/test/CodeGen/AArch64/signbit-test.ll
+++ b/llvm/test/CodeGen/AArch64/signbit-test.ll
@@ -1,13 +1,21 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s
+; RUN: llc -mtriple=aarch64--  < %s | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64-- -global-isel < %s | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
 define i64 @test_clear_mask_i64_i32(i64 %x) nounwind {
-; CHECK-LABEL: test_clear_mask_i64_i32:
-; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    mov w8, #42 // =0x2a
-; CHECK-NEXT:    cmn w0, #1
-; CHECK-NEXT:    csel x0, x8, x0, gt
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: test_clear_mask_i64_i32:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    mov w8, #42 // =0x2a
+; CHECK-SD-NEXT:    cmn w0, #1
+; CHECK-SD-NEXT:    csel x0, x8, x0, gt
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: test_clear_mask_i64_i32:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    mov w8, #42 // =0x2a
+; CHECK-GI-NEXT:    tst x0, #0x80000000
+; CHECK-GI-NEXT:    csel x0, x8, x0, eq
+; CHECK-GI-NEXT:    ret
 entry:
   %a = and i64 %x, 2147483648
   %r = icmp eq i64 %a, 0
diff --git a/llvm/test/CodeGen/AArch64/signed-truncation-check.ll b/llvm/test/CodeGen/AArch64/signed-truncation-check.ll
index 7c80f9320faec..fc01c6b2c5471 100644
--- a/llvm/test/CodeGen/AArch64/signed-truncation-check.ll
+++ b/llvm/test/CodeGen/AArch64/signed-truncation-check.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64-unknown-linux-gnu -global-isel < %s | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
 ; https://bugs.llvm.org/show_bug.cgi?id=38149
 
@@ -19,13 +20,22 @@
 ; ---------------------------------------------------------------------------- ;
 
 define i1 @shifts_eqcmp_i16_i8(i16 %x) nounwind {
-; CHECK-LABEL: shifts_eqcmp_i16_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w8, w0
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, w0, uxth
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: shifts_eqcmp_i16_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    sxtb w8, w0
+; CHECK-SD-NEXT:    and w8, w8, #0xffff
+; CHECK-SD-NEXT:    cmp w8, w0, uxth
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: shifts_eqcmp_i16_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    lsl w8, w0, #8
+; CHECK-GI-NEXT:    sbfx w8, w8, #8, #8
+; CHECK-GI-NEXT:    and w8, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, w0, uxth
+; CHECK-GI-NEXT:    cset w0, eq
+; CHECK-GI-NEXT:    ret
   %tmp0 = shl i16 %x, 8 ; 16-8
   %tmp1 = ashr exact i16 %tmp0, 8 ; 16-8
   %tmp2 = icmp eq i16 %tmp1, %x
@@ -97,26 +107,43 @@ define i1 @shifts_eqcmp_i64_i8(i64 %x) nounwind {
 ; ---------------------------------------------------------------------------- ;
 
 define i1 @add_ugecmp_i16_i8(i16 %x) nounwind {
-; CHECK-LABEL: add_ugecmp_i16_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    sub w8, w8, #128
-; CHECK-NEXT:    lsr w8, w8, #8
-; CHECK-NEXT:    cmp w8, #254
-; CHECK-NEXT:    cset w0, hi
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugecmp_i16_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xffff
+; CHECK-SD-NEXT:    sub w8, w8, #128
+; CHECK-SD-NEXT:    lsr w8, w8, #8
+; CHECK-SD-NEXT:    cmp w8, #254
+; CHECK-SD-NEXT:    cset w0, hi
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugecmp_i16_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-128 // =0xffffff80
+; CHECK-GI-NEXT:    mov w9, #65280 // =0xff00
+; CHECK-GI-NEXT:    add w8, w8, w0, uxth
+; CHECK-GI-NEXT:    cmp w8, w9
+; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp uge i16 %tmp0, -256 ; ~0U << 8
   ret i1 %tmp1
 }
 
 define i1 @add_ugecmp_i32_i16_i8(i16 %xx) nounwind {
-; CHECK-LABEL: add_ugecmp_i32_i16_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    cmp w8, w8, sxtb
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugecmp_i32_i16_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xffff
+; CHECK-SD-NEXT:    cmp w8, w8, sxtb
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugecmp_i32_i16_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-128 // =0xffffff80
+; CHECK-GI-NEXT:    add w8, w8, w0, uxth
+; CHECK-GI-NEXT:    cmn w8, #256
+; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    ret
   %x = zext i16 %xx to i32
   %tmp0 = add i32 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp uge i32 %tmp0, -256 ; ~0U << 8
@@ -124,55 +151,92 @@ define i1 @add_ugecmp_i32_i16_i8(i16 %xx) nounwind {
 }
 
 define i1 @add_ugecmp_i32_i16(i32 %x) nounwind {
-; CHECK-LABEL: add_ugecmp_i32_i16:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp w0, w0, sxth
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugecmp_i32_i16:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp w0, w0, sxth
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugecmp_i32_i16:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    sub w8, w0, #8, lsl #12 // =32768
+; CHECK-GI-NEXT:    cmn w8, #16, lsl #12 // =65536
+; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i32 %x, -32768 ; ~0U << (16-1)
   %tmp1 = icmp uge i32 %tmp0, -65536 ; ~0U << 16
   ret i1 %tmp1
 }
 
 define i1 @add_ugecmp_i32_i8(i32 %x) nounwind {
-; CHECK-LABEL: add_ugecmp_i32_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp w0, w0, sxtb
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugecmp_i32_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp w0, w0, sxtb
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugecmp_i32_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    sub w8, w0, #128
+; CHECK-GI-NEXT:    cmn w8, #256
+; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i32 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp uge i32 %tmp0, -256 ; ~0U << 8
   ret i1 %tmp1
 }
 
 define i1 @add_ugecmp_i64_i32(i64 %x) nounwind {
-; CHECK-LABEL: add_ugecmp_i64_i32:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x0, w0, sxtw
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugecmp_i64_i32:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp x0, w0, sxtw
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugecmp_i64_i32:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov x8, #-2147483648 // =0xffffffff80000000
+; CHECK-GI-NEXT:    mov x9, #-4294967296 // =0xffffffff00000000
+; CHECK-GI-NEXT:    add x8, x0, x8
+; CHECK-GI-NEXT:    cmp x8, x9
+; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i64 %x, -2147483648 ; ~0U << (32-1)
   %tmp1 = icmp uge i64 %tmp0, -4294967296 ; ~0U << 32
   ret i1 %tmp1
 }
 
 define i1 @add_ugecmp_i64_i16(i64 %x) nounwind {
-; CHECK-LABEL: add_ugecmp_i64_i16:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x0, w0, sxth
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugecmp_i64_i16:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp x0, w0, sxth
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugecmp_i64_i16:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    sub x8, x0, #8, lsl #12 // =32768
+; CHECK-GI-NEXT:    cmn x8, #16, lsl #12 // =65536
+; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i64 %x, -32768 ; ~0U << (16-1)
   %tmp1 = icmp uge i64 %tmp0, -65536 ; ~0U << 16
   ret i1 %tmp1
 }
 
 define i1 @add_ugecmp_i64_i8(i64 %x) nounwind {
-; CHECK-LABEL: add_ugecmp_i64_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x0, w0, sxtb
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugecmp_i64_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp x0, w0, sxtb
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugecmp_i64_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    sub x8, x0, #128
+; CHECK-GI-NEXT:    cmn x8, #256
+; CHECK-GI-NEXT:    cset w0, hs
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i64 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp uge i64 %tmp0, -256 ; ~0U << 8
   ret i1 %tmp1
@@ -180,14 +244,23 @@ define i1 @add_ugecmp_i64_i8(i64 %x) nounwind {
 
 ; Slightly more canonical variant
 define i1 @add_ugtcmp_i16_i8(i16 %x) nounwind {
-; CHECK-LABEL: add_ugtcmp_i16_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    sub w8, w8, #128
-; CHECK-NEXT:    lsr w8, w8, #8
-; CHECK-NEXT:    cmp w8, #254
-; CHECK-NEXT:    cset w0, hi
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ugtcmp_i16_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xffff
+; CHECK-SD-NEXT:    sub w8, w8, #128
+; CHECK-SD-NEXT:    lsr w8, w8, #8
+; CHECK-SD-NEXT:    cmp w8, #254
+; CHECK-SD-NEXT:    cset w0, hi
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ugtcmp_i16_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-128 // =0xffffff80
+; CHECK-GI-NEXT:    mov w9, #65279 // =0xfeff
+; CHECK-GI-NEXT:    add w8, w8, w0, uxth
+; CHECK-GI-NEXT:    cmp w8, w9
+; CHECK-GI-NEXT:    cset w0, hi
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp ugt i16 %tmp0, -257 ; ~0U << 8 - 1
   ret i1 %tmp1
@@ -198,68 +271,113 @@ define i1 @add_ugtcmp_i16_i8(i16 %x) nounwind {
 ; ---------------------------------------------------------------------------- ;
 
 define i1 @add_ultcmp_i16_i8(i16 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_i16_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w8, w0
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, w0, uxth
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_i16_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    sxtb w8, w0
+; CHECK-SD-NEXT:    and w8, w8, #0xffff
+; CHECK-SD-NEXT:    cmp w8, w0, uxth
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_i16_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #128
+; CHECK-GI-NEXT:    and w8, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, #256
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ult i16 %tmp0, 256 ; 1U << 8
   ret i1 %tmp1
 }
 
 define i1 @add_ultcmp_i32_i16(i32 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_i32_i16:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp w0, w0, sxth
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_i32_i16:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp w0, w0, sxth
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_i32_i16:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #8, lsl #12 // =32768
+; CHECK-GI-NEXT:    cmp w8, #16, lsl #12 // =65536
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i32 %x, 32768 ; 1U << (16-1)
   %tmp1 = icmp ult i32 %tmp0, 65536 ; 1U << 16
   ret i1 %tmp1
 }
 
 define i1 @add_ultcmp_i32_i8(i32 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_i32_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp w0, w0, sxtb
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_i32_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp w0, w0, sxtb
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_i32_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #128
+; CHECK-GI-NEXT:    cmp w8, #256
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i32 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ult i32 %tmp0, 256 ; 1U << 8
   ret i1 %tmp1
 }
 
 define i1 @add_ultcmp_i64_i32(i64 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_i64_i32:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x0, w0, sxtw
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_i64_i32:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp x0, w0, sxtw
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_i64_i32:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-GI-NEXT:    mov x9, #4294967296 // =0x100000000
+; CHECK-GI-NEXT:    add x8, x0, x8
+; CHECK-GI-NEXT:    cmp x8, x9
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i64 %x, 2147483648 ; 1U << (32-1)
   %tmp1 = icmp ult i64 %tmp0, 4294967296 ; 1U << 32
   ret i1 %tmp1
 }
 
 define i1 @add_ultcmp_i64_i16(i64 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_i64_i16:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x0, w0, sxth
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_i64_i16:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp x0, w0, sxth
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_i64_i16:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add x8, x0, #8, lsl #12 // =32768
+; CHECK-GI-NEXT:    cmp x8, #16, lsl #12 // =65536
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i64 %x, 32768 ; 1U << (16-1)
   %tmp1 = icmp ult i64 %tmp0, 65536 ; 1U << 16
   ret i1 %tmp1
 }
 
 define i1 @add_ultcmp_i64_i8(i64 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_i64_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x0, w0, sxtb
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_i64_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    cmp x0, w0, sxtb
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_i64_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add x8, x0, #128
+; CHECK-GI-NEXT:    cmp x8, #256
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i64 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ult i64 %tmp0, 256 ; 1U << 8
   ret i1 %tmp1
@@ -267,13 +385,21 @@ define i1 @add_ultcmp_i64_i8(i64 %x) nounwind {
 
 ; Slightly more canonical variant
 define i1 @add_ulecmp_i16_i8(i16 %x) nounwind {
-; CHECK-LABEL: add_ulecmp_i16_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w8, w0
-; CHECK-NEXT:    and w8, w8, #0xffff
-; CHECK-NEXT:    cmp w8, w0, uxth
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ulecmp_i16_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    sxtb w8, w0
+; CHECK-SD-NEXT:    and w8, w8, #0xffff
+; CHECK-SD-NEXT:    cmp w8, w0, uxth
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ulecmp_i16_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #128
+; CHECK-GI-NEXT:    and w8, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, #255
+; CHECK-GI-NEXT:    cset w0, ls
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ule i16 %tmp0, 255 ; (1U << 8) - 1
   ret i1 %tmp1
@@ -284,12 +410,20 @@ define i1 @add_ulecmp_i16_i8(i16 %x) nounwind {
 
 ; Adding not a constant
 define i1 @add_ultcmp_bad_i16_i8_add(i16 %x, i16 %y) nounwind {
-; CHECK-LABEL: add_ultcmp_bad_i16_i8_add:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, w1
-; CHECK-NEXT:    tst w8, #0xff00
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_bad_i16_i8_add:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    add w8, w0, w1
+; CHECK-SD-NEXT:    tst w8, #0xff00
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_bad_i16_i8_add:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, w1
+; CHECK-GI-NEXT:    and w8, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, #256
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, %y
   %tmp1 = icmp ult i16 %tmp0, 256 ; 1U << 8
   ret i1 %tmp1
@@ -311,12 +445,20 @@ define i1 @add_ultcmp_bad_i16_i8_cmp(i16 %x, i16 %y) nounwind {
 
 ; Second constant is not larger than the first one
 define i1 @add_ultcmp_bad_i8_i16(i16 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_bad_i8_i16:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w8, w0, #0xffff
-; CHECK-NEXT:    add w8, w8, #128
-; CHECK-NEXT:    lsr w0, w8, #16
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_bad_i8_i16:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    and w8, w0, #0xffff
+; CHECK-SD-NEXT:    add w8, w8, #128
+; CHECK-SD-NEXT:    lsr w0, w8, #16
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_bad_i8_i16:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    and w8, w0, #0xffff
+; CHECK-GI-NEXT:    add w8, w8, #128
+; CHECK-GI-NEXT:    cmp w8, w8, uxth
+; CHECK-GI-NEXT:    cset w0, ne
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ult i16 %tmp0, 128 ; 1U << (8-1)
   ret i1 %tmp1
@@ -324,12 +466,20 @@ define i1 @add_ultcmp_bad_i8_i16(i16 %x) nounwind {
 
 ; First constant is not power of two
 define i1 @add_ultcmp_bad_i16_i8_c0notpoweroftwo(i16 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_bad_i16_i8_c0notpoweroftwo:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #192
-; CHECK-NEXT:    tst w8, #0xff00
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_bad_i16_i8_c0notpoweroftwo:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    add w8, w0, #192
+; CHECK-SD-NEXT:    tst w8, #0xff00
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_bad_i16_i8_c0notpoweroftwo:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #192
+; CHECK-GI-NEXT:    and w8, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, #256
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, 192 ; (1U << (8-1)) + (1U << (8-1-1))
   %tmp1 = icmp ult i16 %tmp0, 256 ; 1U << 8
   ret i1 %tmp1
@@ -351,12 +501,20 @@ define i1 @add_ultcmp_bad_i16_i8_c1notpoweroftwo(i16 %x) nounwind {
 
 ; Magic check fails, 64 << 1 != 256
 define i1 @add_ultcmp_bad_i16_i8_magic(i16 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_bad_i16_i8_magic:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #64
-; CHECK-NEXT:    tst w8, #0xff00
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_bad_i16_i8_magic:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    add w8, w0, #64
+; CHECK-SD-NEXT:    tst w8, #0xff00
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_bad_i16_i8_magic:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #64
+; CHECK-GI-NEXT:    and w8, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, #256
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, 64 ; 1U << (8-1-1)
   %tmp1 = icmp ult i16 %tmp0, 256 ; 1U << 8
   ret i1 %tmp1
@@ -364,12 +522,20 @@ define i1 @add_ultcmp_bad_i16_i8_magic(i16 %x) nounwind {
 
 ; Bad 'destination type'
 define i1 @add_ultcmp_bad_i16_i4(i16 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_bad_i16_i4:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #8
-; CHECK-NEXT:    tst w8, #0xfff0
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_bad_i16_i4:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    add w8, w0, #8
+; CHECK-SD-NEXT:    tst w8, #0xfff0
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_bad_i16_i4:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #8
+; CHECK-GI-NEXT:    and w8, w8, #0xffff
+; CHECK-GI-NEXT:    cmp w8, #16
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i16 %x, 8 ; 1U << (4-1)
   %tmp1 = icmp ult i16 %tmp0, 16 ; 1U << 4
   ret i1 %tmp1
@@ -377,12 +543,20 @@ define i1 @add_ultcmp_bad_i16_i4(i16 %x) nounwind {
 
 ; Bad storage type
 define i1 @add_ultcmp_bad_i24_i8(i24 %x) nounwind {
-; CHECK-LABEL: add_ultcmp_bad_i24_i8:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    add w8, w0, #128
-; CHECK-NEXT:    tst w8, #0xffff00
-; CHECK-NEXT:    cset w0, eq
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: add_ultcmp_bad_i24_i8:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    add w8, w0, #128
+; CHECK-SD-NEXT:    tst w8, #0xffff00
+; CHECK-SD-NEXT:    cset w0, eq
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: add_ultcmp_bad_i24_i8:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add w8, w0, #128
+; CHECK-GI-NEXT:    and w8, w8, #0xffffff
+; CHECK-GI-NEXT:    cmp w8, #256
+; CHECK-GI-NEXT:    cset w0, lo
+; CHECK-GI-NEXT:    ret
   %tmp0 = add i24 %x, 128 ; 1U << (8-1)
   %tmp1 = icmp ult i24 %tmp0, 256 ; 1U << 8
   ret i1 %tmp1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll
index 7714c032d1737..d3e211855d7ed 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll
@@ -113,9 +113,9 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, s0
-; CHECK-NEXT:    s_cmp_eq_u32 s0, 0
+; CHECK-NEXT:    s_xor_b32 s0, s0, 1
+; CHECK-NEXT:    s_and_b32 s0, s0, 1
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
 ; CHECK-NEXT:    s_cbranch_scc1 .LBB8_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
@@ -161,16 +161,17 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, s0
+; CHECK-NEXT:    s_xor_b32 s0, s0, 1
+; CHECK-NEXT:    s_xor_b32 s0, s0, 1
+; CHECK-NEXT:    s_and_b32 s0, s0, 1
 ; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB10_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB10_3
-; CHECK-NEXT:  .LBB10_2: ; %true
+; CHECK-NEXT:    s_cbranch_scc1 .LBB10_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB10_3
+; CHECK-NEXT:  .LBB10_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB10_3
 ; CHECK-NEXT:  .LBB10_3:
   %c = trunc i32 %v to i1
   %ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %c)
@@ -208,11 +209,7 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
-; CHECK-NEXT:    s_cselect_b32 s0, 1, 0
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, s0
-; CHECK-NEXT:    s_cmp_eq_u32 s0, 0
+; CHECK-NEXT:    s_cmp_ge_u32 s0, 12
 ; CHECK-NEXT:    s_cbranch_scc1 .LBB12_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
@@ -258,17 +255,13 @@ define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_compare:
 ; CHECK:       ; %bb.0:
 ; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
-; CHECK-NEXT:    s_cselect_b32 s0, 1, 0
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, s0
-; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB14_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB14_3
-; CHECK-NEXT:  .LBB14_2: ; %true
+; CHECK-NEXT:    s_cbranch_scc1 .LBB14_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB14_3
+; CHECK-NEXT:  .LBB14_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB14_3
 ; CHECK-NEXT:  .LBB14_3:
   %c = icmp ult i32 %v, 12
   %ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %c)
@@ -310,14 +303,12 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_and(i32 inreg %v1, i32 inreg %v2) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_and:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
+; CHECK-NEXT:    s_cmp_ge_u32 s0, 12
 ; CHECK-NEXT:    s_cselect_b32 s0, 1, 0
-; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
+; CHECK-NEXT:    s_cmp_le_u32 s1, 34
 ; CHECK-NEXT:    s_cselect_b32 s1, 1, 0
-; CHECK-NEXT:    s_and_b32 s0, s0, s1
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, s0
-; CHECK-NEXT:    s_cmp_eq_u32 s0, 0
+; CHECK-NEXT:    s_or_b32 s0, s0, s1
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
 ; CHECK-NEXT:    s_cbranch_scc1 .LBB16_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
@@ -372,16 +363,14 @@ define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_and(i32 inreg %v1, i32 inreg
 ; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
 ; CHECK-NEXT:    s_cselect_b32 s1, 1, 0
 ; CHECK-NEXT:    s_and_b32 s0, s0, s1
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, s0
 ; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB18_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB18_3
-; CHECK-NEXT:  .LBB18_2: ; %true
+; CHECK-NEXT:    s_cbranch_scc1 .LBB18_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB18_3
+; CHECK-NEXT:  .LBB18_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB18_3
 ; CHECK-NEXT:  .LBB18_3:
   %v1c = icmp ult i32 %v1, 12
   %v2c = icmp ugt i32 %v2, 34
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
index 7b8166948610b..250fbc7c0f147 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
@@ -116,9 +116,9 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, s0
-; CHECK-NEXT:    s_cmp_eq_u64 s[0:1], 0
+; CHECK-NEXT:    s_xor_b32 s0, s0, 1
+; CHECK-NEXT:    s_and_b32 s0, s0, 1
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
 ; CHECK-NEXT:    s_cbranch_scc1 .LBB8_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
@@ -164,16 +164,17 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, s0
-; CHECK-NEXT:    s_cmp_lg_u64 s[0:1], 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB10_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB10_3
-; CHECK-NEXT:  .LBB10_2: ; %true
+; CHECK-NEXT:    s_xor_b32 s0, s0, 1
+; CHECK-NEXT:    s_xor_b32 s0, s0, 1
+; CHECK-NEXT:    s_and_b32 s0, s0, 1
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
+; CHECK-NEXT:    s_cbranch_scc1 .LBB10_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB10_3
+; CHECK-NEXT:  .LBB10_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB10_3
 ; CHECK-NEXT:  .LBB10_3:
   %c = trunc i32 %v to i1
   %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %c)
@@ -211,11 +212,7 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
-; CHECK-NEXT:    s_cselect_b32 s0, 1, 0
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, s0
-; CHECK-NEXT:    s_cmp_eq_u64 s[0:1], 0
+; CHECK-NEXT:    s_cmp_ge_u32 s0, 12
 ; CHECK-NEXT:    s_cbranch_scc1 .LBB12_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
@@ -261,17 +258,13 @@ define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_compare:
 ; CHECK:       ; %bb.0:
 ; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
-; CHECK-NEXT:    s_cselect_b32 s0, 1, 0
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, s0
-; CHECK-NEXT:    s_cmp_lg_u64 s[0:1], 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB14_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB14_3
-; CHECK-NEXT:  .LBB14_2: ; %true
+; CHECK-NEXT:    s_cbranch_scc1 .LBB14_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB14_3
+; CHECK-NEXT:  .LBB14_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB14_3
 ; CHECK-NEXT:  .LBB14_3:
   %c = icmp ult i32 %v, 12
   %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %c)
@@ -313,14 +306,12 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_and(i32 inreg %v1, i32 inreg %v2) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_and:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
+; CHECK-NEXT:    s_cmp_ge_u32 s0, 12
 ; CHECK-NEXT:    s_cselect_b32 s0, 1, 0
-; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
+; CHECK-NEXT:    s_cmp_le_u32 s1, 34
 ; CHECK-NEXT:    s_cselect_b32 s1, 1, 0
-; CHECK-NEXT:    s_and_b32 s0, s0, s1
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, s0
-; CHECK-NEXT:    s_cmp_eq_u64 s[0:1], 0
+; CHECK-NEXT:    s_or_b32 s0, s0, s1
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
 ; CHECK-NEXT:    s_cbranch_scc1 .LBB16_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
@@ -375,16 +366,14 @@ define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_and(i32 inreg %v1, i32 inreg
 ; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
 ; CHECK-NEXT:    s_cselect_b32 s1, 1, 0
 ; CHECK-NEXT:    s_and_b32 s0, s0, s1
-; CHECK-NEXT:    s_and_b32 s0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, s0
-; CHECK-NEXT:    s_cmp_lg_u64 s[0:1], 0
-; CHECK-NEXT:    s_cbranch_scc0 .LBB18_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB18_3
-; CHECK-NEXT:  .LBB18_2: ; %true
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
+; CHECK-NEXT:    s_cbranch_scc1 .LBB18_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB18_3
+; CHECK-NEXT:  .LBB18_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB18_3
 ; CHECK-NEXT:  .LBB18_3:
   %v1c = icmp ult i32 %v1, 12
   %v2c = icmp ugt i32 %v2, 34
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-miscellaneous-uniform-intrinsic.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-miscellaneous-uniform-intrinsic.ll
new file mode 100644
index 0000000000000..34d4c519851d4
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-miscellaneous-uniform-intrinsic.ll
@@ -0,0 +1,173 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100  -o - %s | FileCheck %s
+define amdgpu_kernel void @readfirstlane_with_readfirstlane(ptr addrspace(1) %out) {
+; CHECK-LABEL: readfirstlane_with_readfirstlane:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 5
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_endpgm
+  %v1 = call i32 @llvm.amdgcn.readfirstlane(i32 5)
+  %v2 = call i32 @llvm.amdgcn.readfirstlane(i32 %v1)
+  store i32 %v2, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @readfirstlane_with_readlane(ptr addrspace(1) %out) {
+; CHECK-LABEL: readfirstlane_with_readlane:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    v_bfe_u32 v1, v0, 10, 10
+; CHECK-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; CHECK-NEXT:    v_readfirstlane_b32 s2, v1
+; CHECK-NEXT:    v_readlane_b32 s2, v0, s2
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_endpgm
+  %tidx = call i32 @llvm.amdgcn.workitem.id.x()
+  %tidy = call i32 @llvm.amdgcn.workitem.id.y()
+  %v1 = call i32 @llvm.amdgcn.readlane(i32 %tidx, i32 %tidy)
+  %v2 = call i32 @llvm.amdgcn.readfirstlane(i32 %v1)
+  store i32 %v2, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @readlane_with_firstlane(ptr addrspace(1) %out) {
+; CHECK-LABEL: readlane_with_firstlane:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; CHECK-NEXT:    v_readfirstlane_b32 s2, v0
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_endpgm
+  %tidx = call i32 @llvm.amdgcn.workitem.id.x()
+  %v1 = call i32 @llvm.amdgcn.readfirstlane(i32 %tidx)
+  %v2 = call i32 @llvm.amdgcn.readlane(i32 %v1, i32 3)
+  store i32 %v2, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @readlane_readlane(ptr addrspace(1) %out) {
+; CHECK-LABEL: readlane_readlane:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    v_bfe_u32 v1, v0, 10, 10
+; CHECK-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; CHECK-NEXT:    v_readfirstlane_b32 s2, v1
+; CHECK-NEXT:    v_readlane_b32 s2, v0, s2
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_endpgm
+  %tidx = call i32 @llvm.amdgcn.workitem.id.x()
+  %tidy = call i32 @llvm.amdgcn.workitem.id.y()
+  %v1 = call i32 @llvm.amdgcn.readlane(i32 %tidx, i32 %tidy)
+  %v2 = call i32 @llvm.amdgcn.readlane(i32 %v1, i32 2)
+  store i32 %v2, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @permlane64_uniform(ptr addrspace(1) %out, i32 %src) {
+; CHECK-LABEL: permlane64_uniform:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_clause 0x1
+; CHECK-NEXT:    s_load_b32 s2, s[4:5], 0x8
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_endpgm
+  %v = call i32 @llvm.amdgcn.permlane64(i32 %src)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @permlane64_nonuniform(i32 addrspace(1)* %out) {
+; CHECK-LABEL: permlane64_nonuniform:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; CHECK-NEXT:    v_permlane64_b32 v1, v0
+; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_endpgm
+  %tid = call i32 @llvm.amdgcn.workitem.id.x()
+  %v = call i32 @llvm.amdgcn.permlane64(i32 %tid)
+  %out_ptr = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
+  store i32 %v, i32 addrspace(1)* %out_ptr
+  ret void
+}
+
+define amdgpu_kernel void @permlane64_nonuniform_expression(i32 addrspace(1)* %out) {
+; CHECK-LABEL: permlane64_nonuniform_expression:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
+; CHECK-NEXT:    v_add_nc_u32_e32 v1, 1, v0
+; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
+; CHECK-NEXT:    v_permlane64_b32 v1, v1
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_endpgm
+  %tid = call i32 @llvm.amdgcn.workitem.id.x()
+  %tid2 = add i32 %tid, 1
+  %v = call i32 @llvm.amdgcn.permlane64(i32 %tid2)
+  %out_ptr = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
+  store i32 %v, i32 addrspace(1)* %out_ptr
+  ret void
+}
+
+define protected amdgpu_kernel void @trivial_waterfall_eq_zero(ptr addrspace(1) %out) {
+; CHECK-LABEL: trivial_waterfall_eq_zero:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    s_load_b64 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 5
+; CHECK-NEXT:    s_mov_b32 s2, 0
+; CHECK-NEXT:    s_branch .LBB7_2
+; CHECK-NEXT:  .LBB7_1: ; %Flow
+; CHECK-NEXT:    ; in Loop: Header=BB7_2 Depth=1
+; CHECK-NEXT:    s_and_not1_b32 vcc_lo, exec_lo, s2
+; CHECK-NEXT:    s_mov_b32 s2, -1
+; CHECK-NEXT:    s_cbranch_vccz .LBB7_4
+; CHECK-NEXT:  .LBB7_2: ; %while
+; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    s_and_b32 vcc_lo, exec_lo, s2
+; CHECK-NEXT:    s_mov_b32 s2, -1
+; CHECK-NEXT:    s_cbranch_vccnz .LBB7_1
+; CHECK-NEXT:  ; %bb.3: ; %if
+; CHECK-NEXT:    ; in Loop: Header=BB7_2 Depth=1
+; CHECK-NEXT:    s_mov_b32 s2, 0
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    global_store_b32 v0, v1, s[0:1]
+; CHECK-NEXT:    s_branch .LBB7_1
+; CHECK-NEXT:  .LBB7_4: ; %exit
+; CHECK-NEXT:    s_endpgm
+entry:
+  br label %while
+
+while:
+  %done = phi i1 [ 0, %entry ], [ 1, %if ]
+  %not_done = xor i1 %done, true
+  %ballot = tail call i64 @llvm.amdgcn.ballot.i64(i1 %not_done)
+  %is_done = icmp eq i64 %ballot, 0 ; in this case is_done = !not_done
+  br i1 %is_done, label %exit, label %if
+
+if:
+  store i32 5, ptr addrspace(1) %out
+  br label %while
+
+exit:
+  ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-uniform-waterfall.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-uniform-waterfall.ll
index 33ce278028bba..c962c05d24ad0 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-uniform-waterfall.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-uniform-waterfall.ll
@@ -1,6 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-enable-uniform-intrinsic-combine=0 -O3 -S < %s | FileCheck %s -check-prefix=CURRENT-CHECK
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -passes=amdgpu-uniform-intrinsic-combine -S < %s | FileCheck %s -check-prefix=PASS-CHECK
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-uniform-intrinsic-combine -S < %s | FileCheck %s -check-prefix=PASS-CHECK
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -O3 -S < %s | FileCheck %s -check-prefix=O3-CHECK
 
 define protected amdgpu_kernel void @trivial_waterfall_eq_zero(ptr addrspace(1) %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-intrinsic-combine.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-intrinsic-combine.ll
index a3e42e564376c..a7e828c95d69f 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-intrinsic-combine.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-intrinsic-combine.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-enable-uniform-intrinsic-combine=0 -O3 -S < %s | FileCheck %s -check-prefix=CURRENT-CHECK
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-uniform-intrinsic-combine -S < %s | FileCheck %s -check-prefix=PASS-CHECK
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -passes=amdgpu-uniform-intrinsic-combine -S < %s | FileCheck %s -check-prefix=PASS-CHECK
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -passes=amdgpu-uniform-intrinsic-combine,dce -S < %s | FileCheck %s -check-prefix=DCE-CHECK
 
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-temporal-divergence.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-temporal-divergence.ll
index 2fde3e3759f47..792926154f7a8 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-temporal-divergence.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-uniform-temporal-divergence.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -passes=amdgpu-uniform-intrinsic-combine -S < %s | FileCheck %s -check-prefix=PASS-CHECK
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-uniform-intrinsic-combine -S < %s | FileCheck %s -check-prefix=PASS-CHECK
 ; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -passes=amdgpu-uniform-intrinsic-combine,instcombine,early-cse,simplifycfg -S < %s | FileCheck %s -check-prefix=COMB-CHECK
 
 ; This should not be optimized
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-wwm.ll b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-wwm.ll
index db32135939a5d..b8f084d5f82ad 100644
--- a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-wwm.ll
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-wwm.ll
@@ -4,24 +4,14 @@
 define amdgpu_gs i32 @main() {
 ; CHECK-LABEL: main:
 ; CHECK:       ; %bb.0: ; %bb
-; CHECK-NEXT:    s_bitcmp1_b32 0, 0
 ; CHECK-NEXT:    s_mov_b32 s0, 0
-; CHECK-NEXT:    s_cselect_b32 s1, -1, 0
-; CHECK-NEXT:    s_or_saveexec_b32 s2, -1
-; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s1
-; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; CHECK-NEXT:    v_readfirstlane_b32 s1, v0
-; CHECK-NEXT:    s_mov_b32 exec_lo, s2
-; CHECK-NEXT:    s_or_b32 s0, s0, s1
-; CHECK-NEXT:    s_wait_alu 0xfffe
+; CHECK-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_1) | instid1(SALU_CYCLE_1)
 ; CHECK-NEXT:    s_bitcmp1_b32 s0, 0
 ; CHECK-NEXT:    s_cselect_b32 s0, -1, 0
-; CHECK-NEXT:    s_wait_alu 0xfffe
 ; CHECK-NEXT:    s_xor_b32 s0, s0, -1
-; CHECK-NEXT:    s_wait_alu 0xfffe
-; CHECK-NEXT:    v_cndmask_b32_e64 v1, 0, 1, s0
-; CHECK-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; CHECK-NEXT:    v_readfirstlane_b32 s0, v1
+; CHECK-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s0
+; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
 ; CHECK-NEXT:    s_wait_alu 0xf1ff
 ; CHECK-NEXT:    ; return to shader part epilog
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/fold-redundant-sgpr-vgpr-rw-across-bb.ll b/llvm/test/CodeGen/AMDGPU/fold-redundant-sgpr-vgpr-rw-across-bb.ll
index 2ad9c0e71c5f3..7adf218f5210e 100644
--- a/llvm/test/CodeGen/AMDGPU/fold-redundant-sgpr-vgpr-rw-across-bb.ll
+++ b/llvm/test/CodeGen/AMDGPU/fold-redundant-sgpr-vgpr-rw-across-bb.ll
@@ -1,6 +1,6 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck --check-prefix=CHECK1 %s
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx942 -amdgpu-si-fold-operands-preheader-threshold=10 < %s | FileCheck --check-prefix=CHECK2 %s
-
+; XFAIL: *
 define protected amdgpu_kernel void @main(ptr addrspace(1) noundef %args.coerce, ptr addrspace(1) noundef %args.coerce2, ptr addrspace(1) noundef %args.coerce4, i32 noundef %args12) {
 ; CHECK1-LABEL: main:
 ; check that non-redundant readfirstlanes are not removed
@@ -76,4 +76,4 @@ for.body.i:                                       ; preds = %for.body.i, %for.bo
 
     add.exit: ; preds = %for.body.i, %entry
     ret void
-}
\ No newline at end of file
+}
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index 3aa36635a0ab6..704ea37117f32 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
 
-; GCN-O0: require<MachineModuleAnalysis>,require<profile-summary>,require<collector-metadata>,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp<O0>),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument<post-inline>,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require<uniformity>,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require<reg-usage>,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0: require<MachineModuleAnalysis>,require<profile-summary>,require<collector-metadata>,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp<O0>),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument<post-inline>,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require<uniformity>,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require<reg-usage>,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
 
-; GCN-O2: require<MachineModuleAnalysis>,require<profile-summary>,require<collector-metadata>,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp<O2>),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-from-gep<>,slsr,early-cse<>,nary-reassociate,early-cse<>,amdgpu-codegenprepare,loop-mssa(licm<allowspeculation>),verify,loop-mssa(canon-freeze,loop-reduce),mergeicmps,expand-memcmp,gc-lowering,lower-constant-intrinsics,unreachableblockelim,consthoist,replace-with-veclib,partially-inline-libcalls,ee-instrument<post-inline>,scalarize-masked-mem-intrin,expand-reductions,early-cse<>),amdgpu-preload-kernel-arguments,function(amdgpu-lower-kernel-arguments,codegenprepare,load-store-vectorizer),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,flatten-cfg,sink,amdgpu-late-codegenprepare,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa)),amdgpu-perf-hint,cgscc(function(require<uniformity>,objc-arc-contract,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,early-tailduplication,opt-phis,stack-coloring,localstackalloc,dead-mi-elimination,early-machinelicm,machine-cse,machine-sink,peephole-opt,dead-mi-elimination,si-fold-operands,gcn-dpp-combine,si-load-store-opt,si-peephole-sdwa,early-machinelicm,machine-cse,si-fold-operands,dead-mi-elimination,si-shrink-instructions))),require<reg-usage>,cgscc(function(machine-function(reg-usage-propagation,amdgpu-prepare-agpr-alloc,detect-dead-lanes,dead-mi-elimination,init-undef,process-imp-defs,unreachable-mbb-elimination,require<live-vars>,si-opt-vgpr-liverange,require<machine-loops>,phi-node-elimination,si-lower-control-flow,two-address-instruction,register-coalescer,rename-independent-subregs,amdgpu-rewrite-partial-reg-uses,machine-scheduler,amdgpu-pre-ra-optimizations,si-wqm,si-optimize-exec-masking-pre-ra,si-form-memory-clauses,amdgpu-pre-ra-long-branch-reg,greedy<sgpr>,virt-reg-rewriter<no-clear-vregs>,stack-slot-coloring,si-lower-sgpr-spills,si-pre-allocate-wwm-regs,greedy<wwm>,si-lower-wwm-copies,virt-reg-rewriter<no-clear-vregs>,amdgpu-reserve-wwm-regs,greedy<vgpr>,amdgpu-nsa-reassign,virt-reg-rewriter,amdgpu-mark-last-scratch-load,machine-cp,machinelicm,si-fix-vgpr-copies,si-optimize-exec-masking,remove-redundant-debug-values,fixup-statepoint-caller-saved,postra-machine-sink,shrink-wrap,prolog-epilog,branch-folder,tailduplication,machine-latecleanup,machine-cp,post-ra-pseudos,si-shrink-instructions,si-post-ra-bundler,postmisched,block-placement,fentry-insert,xray-instrumentation,patchable-function,gcn-create-vopd,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,si-pre-emit-peephole,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,amdgpu-insert-delay-alu,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O2: require<MachineModuleAnalysis>,require<profile-summary>,require<collector-metadata>,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp<O2>),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt,amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-from-gep<>,slsr,early-cse<>,nary-reassociate,early-cse<>,amdgpu-codegenprepare,loop-mssa(licm<allowspeculation>),verify,loop-mssa(canon-freeze,loop-reduce),mergeicmps,expand-memcmp,gc-lowering,lower-constant-intrinsics,unreachableblockelim,consthoist,replace-with-veclib,partially-inline-libcalls,ee-instrument<post-inline>,scalarize-masked-mem-intrin,expand-reductions,early-cse<>),amdgpu-preload-kernel-arguments,function(amdgpu-lower-kernel-arguments,codegenprepare,load-store-vectorizer),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,flatten-cfg,sink,amdgpu-late-codegenprepare,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa)),amdgpu-perf-hint,cgscc(function(require<uniformity>,objc-arc-contract,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,early-tailduplication,opt-phis,stack-coloring,localstackalloc,dead-mi-elimination,early-machinelicm,machine-cse,machine-sink,peephole-opt,dead-mi-elimination,si-fold-operands,gcn-dpp-combine,si-load-store-opt,si-peephole-sdwa,early-machinelicm,machine-cse,si-fold-operands,dead-mi-elimination,si-shrink-instructions))),require<reg-usage>,cgscc(function(machine-function(reg-usage-propagation,amdgpu-prepare-agpr-alloc,detect-dead-lanes,dead-mi-elimination,init-undef,process-imp-defs,unreachable-mbb-elimination,require<live-vars>,si-opt-vgpr-liverange,require<machine-loops>,phi-node-elimination,si-lower-control-flow,two-address-instruction,register-coalescer,rename-independent-subregs,amdgpu-rewrite-partial-reg-uses,machine-scheduler,amdgpu-pre-ra-optimizations,si-wqm,si-optimize-exec-masking-pre-ra,si-form-memory-clauses,amdgpu-pre-ra-long-branch-reg,greedy<sgpr>,virt-reg-rewriter<no-clear-vregs>,stack-slot-coloring,si-lower-sgpr-spills,si-pre-allocate-wwm-regs,greedy<wwm>,si-lower-wwm-copies,virt-reg-rewriter<no-clear-vregs>,amdgpu-reserve-wwm-regs,greedy<vgpr>,amdgpu-nsa-reassign,virt-reg-rewriter,amdgpu-mark-last-scratch-load,machine-cp,machinelicm,si-fix-vgpr-copies,si-optimize-exec-masking,remove-redundant-debug-values,fixup-statepoint-caller-saved,postra-machine-sink,shrink-wrap,prolog-epilog,branch-folder,tailduplication,machine-latecleanup,machine-cp,post-ra-pseudos,si-shrink-instructions,si-post-ra-bundler,postmisched,block-placement,fentry-insert,xray-instrumentation,patchable-function,gcn-create-vopd,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,si-pre-emit-peephole,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,amdgpu-insert-delay-alu,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
 
-; GCN-O3: require<MachineModuleAnalysis>,require<profile-summary>,require<collector-metadata>,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp<O3>),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-from-gep<>,slsr,gvn<>,nary-reassociate,early-cse<>,amdgpu-codegenprepare,loop-mssa(licm<allowspeculation>),verify,loop-mssa(canon-freeze,loop-reduce),mergeicmps,expand-memcmp,gc-lowering,lower-constant-intrinsics,unreachableblockelim,consthoist,replace-with-veclib,partially-inline-libcalls,ee-instrument<post-inline>,scalarize-masked-mem-intrin,expand-reductions,gvn<>),amdgpu-preload-kernel-arguments,function(amdgpu-lower-kernel-arguments,codegenprepare,load-store-vectorizer),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,flatten-cfg,sink,amdgpu-late-codegenprepare,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa)),amdgpu-perf-hint,cgscc(function(require<uniformity>,objc-arc-contract,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,early-tailduplication,opt-phis,stack-coloring,localstackalloc,dead-mi-elimination,early-machinelicm,machine-cse,machine-sink,peephole-opt,dead-mi-elimination,si-fold-operands,gcn-dpp-combine,si-load-store-opt,si-peephole-sdwa,early-machinelicm,machine-cse,si-fold-operands,dead-mi-elimination,si-shrink-instructions))),require<reg-usage>,cgscc(function(machine-function(reg-usage-propagation,amdgpu-prepare-agpr-alloc,detect-dead-lanes,dead-mi-elimination,init-undef,process-imp-defs,unreachable-mbb-elimination,require<live-vars>,si-opt-vgpr-liverange,require<machine-loops>,phi-node-elimination,si-lower-control-flow,two-address-instruction,register-coalescer,rename-independent-subregs,amdgpu-rewrite-partial-reg-uses,machine-scheduler,amdgpu-pre-ra-optimizations,si-wqm,si-optimize-exec-masking-pre-ra,si-form-memory-clauses,amdgpu-pre-ra-long-branch-reg,greedy<sgpr>,virt-reg-rewriter<no-clear-vregs>,stack-slot-coloring,si-lower-sgpr-spills,si-pre-allocate-wwm-regs,greedy<wwm>,si-lower-wwm-copies,virt-reg-rewriter<no-clear-vregs>,amdgpu-reserve-wwm-regs,greedy<vgpr>,amdgpu-nsa-reassign,virt-reg-rewriter,amdgpu-mark-last-scratch-load,machine-cp,machinelicm,si-fix-vgpr-copies,si-optimize-exec-masking,remove-redundant-debug-values,fixup-statepoint-caller-saved,postra-machine-sink,shrink-wrap,prolog-epilog,branch-folder,tailduplication,machine-latecleanup,machine-cp,post-ra-pseudos,si-shrink-instructions,si-post-ra-bundler,postmisched,block-placement,fentry-insert,xray-instrumentation,patchable-function,gcn-create-vopd,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,si-pre-emit-peephole,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,amdgpu-insert-delay-alu,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O3: require<MachineModuleAnalysis>,require<profile-summary>,require<collector-metadata>,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp<O3>),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-image-intrinsic-opt,amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(amdgpu-atomic-optimizer,atomic-expand,amdgpu-promote-alloca,separate-const-offset-from-gep<>,slsr,gvn<>,nary-reassociate,early-cse<>,amdgpu-codegenprepare,loop-mssa(licm<allowspeculation>),verify,loop-mssa(canon-freeze,loop-reduce),mergeicmps,expand-memcmp,gc-lowering,lower-constant-intrinsics,unreachableblockelim,consthoist,replace-with-veclib,partially-inline-libcalls,ee-instrument<post-inline>,scalarize-masked-mem-intrin,expand-reductions,gvn<>),amdgpu-preload-kernel-arguments,function(amdgpu-lower-kernel-arguments,codegenprepare,load-store-vectorizer),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,flatten-cfg,sink,amdgpu-late-codegenprepare,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa)),amdgpu-perf-hint,cgscc(function(require<uniformity>,objc-arc-contract,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,early-tailduplication,opt-phis,stack-coloring,localstackalloc,dead-mi-elimination,early-machinelicm,machine-cse,machine-sink,peephole-opt,dead-mi-elimination,si-fold-operands,gcn-dpp-combine,si-load-store-opt,si-peephole-sdwa,early-machinelicm,machine-cse,si-fold-operands,dead-mi-elimination,si-shrink-instructions))),require<reg-usage>,cgscc(function(machine-function(reg-usage-propagation,amdgpu-prepare-agpr-alloc,detect-dead-lanes,dead-mi-elimination,init-undef,process-imp-defs,unreachable-mbb-elimination,require<live-vars>,si-opt-vgpr-liverange,require<machine-loops>,phi-node-elimination,si-lower-control-flow,two-address-instruction,register-coalescer,rename-independent-subregs,amdgpu-rewrite-partial-reg-uses,machine-scheduler,amdgpu-pre-ra-optimizations,si-wqm,si-optimize-exec-masking-pre-ra,si-form-memory-clauses,amdgpu-pre-ra-long-branch-reg,greedy<sgpr>,virt-reg-rewriter<no-clear-vregs>,stack-slot-coloring,si-lower-sgpr-spills,si-pre-allocate-wwm-regs,greedy<wwm>,si-lower-wwm-copies,virt-reg-rewriter<no-clear-vregs>,amdgpu-reserve-wwm-regs,greedy<vgpr>,amdgpu-nsa-reassign,virt-reg-rewriter,amdgpu-mark-last-scratch-load,machine-cp,machinelicm,si-fix-vgpr-copies,si-optimize-exec-masking,remove-redundant-debug-values,fixup-statepoint-caller-saved,postra-machine-sink,shrink-wrap,prolog-epilog,branch-folder,tailduplication,machine-latecleanup,machine-cp,post-ra-pseudos,si-shrink-instructions,si-post-ra-bundler,postmisched,block-placement,fentry-insert,xray-instrumentation,patchable-function,gcn-create-vopd,si-memory-legalizer,si-insert-waitcnts,si-late-branch-lowering,si-pre-emit-peephole,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,amdgpu-insert-delay-alu,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
 
 define void @empty() {
   ret void
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index 6e5212580ba2e..ee6caab6f25cd 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -31,6 +31,11 @@
 ; GCN-O0-NEXT:    AMDGPU Remove Incompatible Functions
 ; GCN-O0-NEXT:    AMDGPU Printf lowering
 ; GCN-O0-NEXT:    Lower ctors and dtors for AMDGPU
+; GCN-O0-NEXT:    FunctionPass Manager
+; GCN-O0-NEXT:      Dominator Tree Construction
+; GCN-O0-NEXT:      Cycle Info Analysis
+; GCN-O0-NEXT:      Uniformity Analysis
+; GCN-O0-NEXT:      AMDGPU Uniform Intrinsic Combine
 ; GCN-O0-NEXT:    Expand variadic functions
 ; GCN-O0-NEXT:    AMDGPU Inline All Functions
 ; GCN-O0-NEXT:    Inliner for always_inline functions
@@ -179,6 +184,11 @@
 ; GCN-O1-NEXT:    AMDGPU Remove Incompatible Functions
 ; GCN-O1-NEXT:    AMDGPU Printf lowering
 ; GCN-O1-NEXT:    Lower ctors and dtors for AMDGPU
+; GCN-O1-NEXT:    FunctionPass Manager
+; GCN-O1-NEXT:      Dominator Tree Construction
+; GCN-O1-NEXT:      Cycle Info Analysis
+; GCN-O1-NEXT:      Uniformity Analysis
+; GCN-O1-NEXT:      AMDGPU Uniform Intrinsic Combine
 ; GCN-O1-NEXT:    Expand variadic functions
 ; GCN-O1-NEXT:    AMDGPU Inline All Functions
 ; GCN-O1-NEXT:    Inliner for always_inline functions
@@ -466,6 +476,11 @@
 ; GCN-O1-OPTS-NEXT:    AMDGPU Remove Incompatible Functions
 ; GCN-O1-OPTS-NEXT:    AMDGPU Printf lowering
 ; GCN-O1-OPTS-NEXT:    Lower ctors and dtors for AMDGPU
+; GCN-O1-OPTS-NEXT:    FunctionPass Manager
+; GCN-O1-OPTS-NEXT:      Dominator Tree Construction
+; GCN-O1-OPTS-NEXT:      Cycle Info Analysis
+; GCN-O1-OPTS-NEXT:      Uniformity Analysis
+; GCN-O1-OPTS-NEXT:      AMDGPU Uniform Intrinsic Combine
 ; GCN-O1-OPTS-NEXT:    Expand variadic functions
 ; GCN-O1-OPTS-NEXT:    AMDGPU Inline All Functions
 ; GCN-O1-OPTS-NEXT:    Inliner for always_inline functions
@@ -783,6 +798,10 @@
 ; GCN-O2-NEXT:    Lower ctors and dtors for AMDGPU
 ; GCN-O2-NEXT:    FunctionPass Manager
 ; GCN-O2-NEXT:      AMDGPU Image Intrinsic Optimizer
+; GCN-O2-NEXT:      Dominator Tree Construction
+; GCN-O2-NEXT:      Cycle Info Analysis
+; GCN-O2-NEXT:      Uniformity Analysis
+; GCN-O2-NEXT:      AMDGPU Uniform Intrinsic Combine
 ; GCN-O2-NEXT:    Expand variadic functions
 ; GCN-O2-NEXT:    AMDGPU Inline All Functions
 ; GCN-O2-NEXT:    Inliner for always_inline functions
@@ -1104,6 +1123,10 @@
 ; GCN-O3-NEXT:    Lower ctors and dtors for AMDGPU
 ; GCN-O3-NEXT:    FunctionPass Manager
 ; GCN-O3-NEXT:      AMDGPU Image Intrinsic Optimizer
+; GCN-O3-NEXT:      Dominator Tree Construction
+; GCN-O3-NEXT:      Cycle Info Analysis
+; GCN-O3-NEXT:      Uniformity Analysis
+; GCN-O3-NEXT:      AMDGPU Uniform Intrinsic Combine
 ; GCN-O3-NEXT:    Expand variadic functions
 ; GCN-O3-NEXT:    AMDGPU Inline All Functions
 ; GCN-O3-NEXT:    Inliner for always_inline functions
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
index e00e1f13b2b77..aa591d28eb346 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
@@ -110,9 +110,8 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, s0, 1
-; CHECK-NEXT:    v_cmp_ne_u32_e64 vcc_lo, s0, 0
-; CHECK-NEXT:    s_cbranch_vccz .LBB8_2
+; CHECK-NEXT:    s_bitcmp0_b32 s0, 0
+; CHECK-NEXT:    s_cbranch_scc1 .LBB8_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB8_3
@@ -156,15 +155,16 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, s0, 1
-; CHECK-NEXT:    v_cmp_ne_u32_e64 vcc_lo, s0, 0
-; CHECK-NEXT:    s_cbranch_vccz .LBB10_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB10_3
-; CHECK-NEXT:  .LBB10_2: ; %true
+; CHECK-NEXT:    s_bitcmp1_b32 s0, 0
+; CHECK-NEXT:    s_cselect_b32 s0, -1, 0
+; CHECK-NEXT:    s_and_b32 vcc_lo, exec_lo, s0
+; CHECK-NEXT:    s_cbranch_vccnz .LBB10_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB10_3
+; CHECK-NEXT:  .LBB10_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB10_3
 ; CHECK-NEXT:  .LBB10_3:
   %c = trunc i32 %v to i1
   %ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %c)
@@ -201,8 +201,8 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    v_cmp_lt_u32_e64 vcc_lo, s0, 12
-; CHECK-NEXT:    s_cbranch_vccz .LBB12_2
+; CHECK-NEXT:    s_cmp_gt_u32 s0, 11
+; CHECK-NEXT:    s_cbranch_scc1 .LBB12_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB12_3
@@ -245,14 +245,14 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    v_cmp_lt_u32_e64 vcc_lo, s0, 12
-; CHECK-NEXT:    s_cbranch_vccz .LBB14_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB14_3
-; CHECK-NEXT:  .LBB14_2: ; %true
+; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
+; CHECK-NEXT:    s_cbranch_scc1 .LBB14_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB14_3
+; CHECK-NEXT:  .LBB14_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB14_3
 ; CHECK-NEXT:  .LBB14_3:
   %c = icmp ult i32 %v, 12
   %ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %c)
@@ -293,13 +293,13 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_and(i32 inreg %v1, i32 inreg %v2) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_and:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
+; CHECK-NEXT:    s_cmp_gt_u32 s0, 11
 ; CHECK-NEXT:    s_cselect_b32 s0, -1, 0
-; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
+; CHECK-NEXT:    s_cmp_lt_u32 s1, 35
 ; CHECK-NEXT:    s_cselect_b32 s1, -1, 0
-; CHECK-NEXT:    s_and_b32 s0, s0, s1
-; CHECK-NEXT:    s_and_b32 s0, s0, exec_lo
-; CHECK-NEXT:    s_cbranch_scc0 .LBB16_2
+; CHECK-NEXT:    s_or_b32 s0, s0, s1
+; CHECK-NEXT:    s_and_b32 vcc_lo, exec_lo, s0
+; CHECK-NEXT:    s_cbranch_vccnz .LBB16_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB16_3
@@ -353,14 +353,14 @@ define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_and(i32 inreg %v1, i32 inreg
 ; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
 ; CHECK-NEXT:    s_cselect_b32 s1, -1, 0
 ; CHECK-NEXT:    s_and_b32 s0, s0, s1
-; CHECK-NEXT:    s_and_b32 s0, s0, exec_lo
-; CHECK-NEXT:    s_cbranch_scc0 .LBB18_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB18_3
-; CHECK-NEXT:  .LBB18_2: ; %true
+; CHECK-NEXT:    s_and_b32 vcc_lo, exec_lo, s0
+; CHECK-NEXT:    s_cbranch_vccnz .LBB18_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB18_3
+; CHECK-NEXT:  .LBB18_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB18_3
 ; CHECK-NEXT:  .LBB18_3:
   %v1c = icmp ult i32 %v1, 12
   %v2c = icmp ugt i32 %v2, 34
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll
index b4adf7f641550..30c2c260a3274 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll
@@ -113,9 +113,8 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, s0, 1
-; CHECK-NEXT:    v_cmp_ne_u32_e64 vcc, s0, 0
-; CHECK-NEXT:    s_cbranch_vccz .LBB8_2
+; CHECK-NEXT:    s_bitcmp0_b32 s0, 0
+; CHECK-NEXT:    s_cbranch_scc1 .LBB8_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB8_3
@@ -159,15 +158,16 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_non_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_non_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_and_b32 s0, s0, 1
-; CHECK-NEXT:    v_cmp_ne_u32_e64 vcc, s0, 0
-; CHECK-NEXT:    s_cbranch_vccz .LBB10_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB10_3
-; CHECK-NEXT:  .LBB10_2: ; %true
+; CHECK-NEXT:    s_bitcmp1_b32 s0, 0
+; CHECK-NEXT:    s_cselect_b64 s[0:1], -1, 0
+; CHECK-NEXT:    s_and_b64 vcc, exec, s[0:1]
+; CHECK-NEXT:    s_cbranch_vccnz .LBB10_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB10_3
+; CHECK-NEXT:  .LBB10_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB10_3
 ; CHECK-NEXT:  .LBB10_3:
   %c = trunc i32 %v to i1
   %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %c)
@@ -204,8 +204,8 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    v_cmp_lt_u32_e64 vcc, s0, 12
-; CHECK-NEXT:    s_cbranch_vccz .LBB12_2
+; CHECK-NEXT:    s_cmp_gt_u32 s0, 11
+; CHECK-NEXT:    s_cbranch_scc1 .LBB12_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB12_3
@@ -248,14 +248,14 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_compare(i32 inreg %v) {
 ; CHECK-LABEL: branch_uniform_ballot_eq_zero_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    v_cmp_lt_u32_e64 vcc, s0, 12
-; CHECK-NEXT:    s_cbranch_vccz .LBB14_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB14_3
-; CHECK-NEXT:  .LBB14_2: ; %true
+; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
+; CHECK-NEXT:    s_cbranch_scc1 .LBB14_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB14_3
+; CHECK-NEXT:  .LBB14_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB14_3
 ; CHECK-NEXT:  .LBB14_3:
   %c = icmp ult i32 %v, 12
   %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %c)
@@ -296,13 +296,13 @@ false:
 define amdgpu_cs i32 @branch_uniform_ballot_ne_zero_and(i32 inreg %v1, i32 inreg %v2) {
 ; CHECK-LABEL: branch_uniform_ballot_ne_zero_and:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_cmp_lt_u32 s0, 12
+; CHECK-NEXT:    s_cmp_gt_u32 s0, 11
 ; CHECK-NEXT:    s_cselect_b64 s[2:3], -1, 0
-; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
+; CHECK-NEXT:    s_cmp_lt_u32 s1, 35
 ; CHECK-NEXT:    s_cselect_b64 s[0:1], -1, 0
-; CHECK-NEXT:    s_and_b64 s[0:1], s[2:3], s[0:1]
-; CHECK-NEXT:    s_and_b64 s[0:1], s[0:1], exec
-; CHECK-NEXT:    s_cbranch_scc0 .LBB16_2
+; CHECK-NEXT:    s_or_b64 s[0:1], s[2:3], s[0:1]
+; CHECK-NEXT:    s_and_b64 vcc, exec, s[0:1]
+; CHECK-NEXT:    s_cbranch_vccnz .LBB16_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB16_3
@@ -356,14 +356,14 @@ define amdgpu_cs i32 @branch_uniform_ballot_eq_zero_and(i32 inreg %v1, i32 inreg
 ; CHECK-NEXT:    s_cmp_gt_u32 s1, 34
 ; CHECK-NEXT:    s_cselect_b64 s[0:1], -1, 0
 ; CHECK-NEXT:    s_and_b64 s[0:1], s[2:3], s[0:1]
-; CHECK-NEXT:    s_and_b64 s[0:1], s[0:1], exec
-; CHECK-NEXT:    s_cbranch_scc0 .LBB18_2
-; CHECK-NEXT:  ; %bb.1: ; %false
-; CHECK-NEXT:    s_mov_b32 s0, 33
-; CHECK-NEXT:    s_branch .LBB18_3
-; CHECK-NEXT:  .LBB18_2: ; %true
+; CHECK-NEXT:    s_and_b64 vcc, exec, s[0:1]
+; CHECK-NEXT:    s_cbranch_vccnz .LBB18_2
+; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB18_3
+; CHECK-NEXT:  .LBB18_2: ; %false
+; CHECK-NEXT:    s_mov_b32 s0, 33
+; CHECK-NEXT:    s_branch .LBB18_3
 ; CHECK-NEXT:  .LBB18_3:
   %v1c = icmp ult i32 %v1, 12
   %v2c = icmp ugt i32 %v2, 34
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ll
index 6dd2258420998..39191d242574f 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ll
@@ -23,10 +23,8 @@ define amdgpu_kernel void @test_s_i32(ptr addrspace(1) %out, i32 %src0) {
 ; GFX11-SDAG-NEXT:    s_load_b32 s2, s[4:5], 0x2c
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, s2
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v0
-; GFX11-SDAG-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; GFX11-SDAG-NEXT:    global_store_b32 v0, v1, s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
 ;
 ; GFX11-GISEL-LABEL: test_s_i32:
@@ -36,8 +34,6 @@ define amdgpu_kernel void @test_s_i32(ptr addrspace(1) %out, i32 %src0) {
 ; GFX11-GISEL-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, s2
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v0, v0
 ; GFX11-GISEL-NEXT:    global_store_b32 v1, v0, s[0:1]
 ; GFX11-GISEL-NEXT:    s_endpgm
   %v = call i32 @llvm.amdgcn.permlane64.i32(i32 %src0)
@@ -50,12 +46,9 @@ define amdgpu_kernel void @test_s_i64(ptr addrspace(1) %out, i64 %src0) {
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v3, 0 :: v_dual_mov_b32 v0, s3
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v2, s2
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v2
-; GFX11-SDAG-NEXT:    global_store_b64 v3, v[0:1], s[0:1]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v1, s3
+; GFX11-SDAG-NEXT:    v_mov_b32_e32 v0, s2
+; GFX11-SDAG-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
 ;
 ; GFX11-GISEL-LABEL: test_s_i64:
@@ -64,9 +57,6 @@ define amdgpu_kernel void @test_s_i64(ptr addrspace(1) %out, i64 %src0) {
 ; GFX11-GISEL-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v0, v0
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v1, v1
 ; GFX11-GISEL-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-GISEL-NEXT:    s_endpgm
   %v = call i64 @llvm.amdgcn.permlane64.i64(i64 %src0)
@@ -79,12 +69,9 @@ define amdgpu_kernel void @test_s_f64(ptr addrspace(1) %out, double %src0) {
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v3, 0 :: v_dual_mov_b32 v0, s3
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v2, s2
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v2
-; GFX11-SDAG-NEXT:    global_store_b64 v3, v[0:1], s[0:1]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v1, s3
+; GFX11-SDAG-NEXT:    v_mov_b32_e32 v0, s2
+; GFX11-SDAG-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
 ;
 ; GFX11-GISEL-LABEL: test_s_f64:
@@ -93,9 +80,6 @@ define amdgpu_kernel void @test_s_f64(ptr addrspace(1) %out, double %src0) {
 ; GFX11-GISEL-NEXT:    v_mov_b32_e32 v2, 0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v0, v0
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v1, v1
 ; GFX11-GISEL-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-GISEL-NEXT:    s_endpgm
   %v = call double @llvm.amdgcn.permlane64.f64(double %src0)
@@ -116,19 +100,15 @@ define amdgpu_kernel void @test_i_i32(ptr addrspace(1) %out) {
 ; GFX11-SDAG-LABEL: test_i_i32:
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0x63 :: v_dual_mov_b32 v1, 0
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v0
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x63
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GFX11-SDAG-NEXT:    global_store_b32 v0, v1, s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
 ;
 ; GFX11-GISEL-LABEL: test_i_i32:
 ; GFX11-GISEL:       ; %bb.0:
 ; GFX11-GISEL-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v0, 0x63 :: v_dual_mov_b32 v1, 0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v0, v0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    global_store_b32 v1, v0, s[0:1]
 ; GFX11-GISEL-NEXT:    s_endpgm
@@ -141,19 +121,15 @@ define amdgpu_kernel void @test_i_f32(ptr addrspace(1) %out) {
 ; GFX11-SDAG-LABEL: test_i_f32:
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0x449a5000 :: v_dual_mov_b32 v1, 0
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v0
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x449a5000
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GFX11-SDAG-NEXT:    global_store_b32 v0, v1, s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
 ;
 ; GFX11-GISEL-LABEL: test_i_f32:
 ; GFX11-GISEL:       ; %bb.0:
 ; GFX11-GISEL-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_dual_mov_b32 v0, 0x449a5000 :: v_dual_mov_b32 v1, 0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v0, v0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    global_store_b32 v1, v0, s[0:1]
 ; GFX11-GISEL-NEXT:    s_endpgm
@@ -166,23 +142,16 @@ define amdgpu_kernel void @test_i_i64(ptr addrspace(1) %out) {
 ; GFX11-SDAG-LABEL: test_i_i64:
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v2, 0
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v0, 0x63
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v2
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v0
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, 0x63
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
+; GFX11-SDAG-NEXT:    global_store_b64 v1, v[0:1], s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
 ;
 ; GFX11-GISEL-LABEL: test_i_i64:
 ; GFX11-GISEL:       ; %bb.0:
 ; GFX11-GISEL-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-GISEL-NEXT:    v_mov_b32_e32 v0, 0x63
-; GFX11-GISEL-NEXT:    v_mov_b32_e32 v2, 0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v0, v0
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v1, v2
+; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v2, 0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-GISEL-NEXT:    s_endpgm
@@ -195,22 +164,16 @@ define amdgpu_kernel void @test_i_f64(ptr addrspace(1) %out) {
 ; GFX11-SDAG-LABEL: test_i_f64:
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v0, 0x40934a00
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v2, 0
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v2
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x40934a00
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
+; GFX11-SDAG-NEXT:    global_store_b64 v0, v[0:1], s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
 ;
 ; GFX11-GISEL-LABEL: test_i_f64:
 ; GFX11-GISEL:       ; %bb.0:
 ; GFX11-GISEL-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX11-GISEL-NEXT:    v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v1, 0x40934a00
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v0, v2
-; GFX11-GISEL-NEXT:    v_permlane64_b32 v1, v1
+; GFX11-GISEL-NEXT:    v_mov_b32_e32 v0, 0
+; GFX11-GISEL-NEXT:    v_dual_mov_b32 v1, 0x40934a00 :: v_dual_mov_b32 v2, 0
 ; GFX11-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX11-GISEL-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-GISEL-NEXT:    s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ptr.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ptr.ll
index b0149f7de5e85..672b658659824 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ptr.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ptr.ll
@@ -6,12 +6,9 @@ define amdgpu_kernel void @test_p0(ptr addrspace(1) %out, ptr %src0) {
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v3, 0 :: v_dual_mov_b32 v0, s3
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v2, s2
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v2
-; GFX11-SDAG-NEXT:    global_store_b64 v3, v[0:1], s[0:1]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v1, s3
+; GFX11-SDAG-NEXT:    v_mov_b32_e32 v0, s2
+; GFX11-SDAG-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
   %v = call ptr @llvm.amdgcn.permlane64.p0(ptr %src0)
   store ptr %v, ptr addrspace(1) %out
@@ -22,21 +19,14 @@ define amdgpu_kernel void @test_v3p0(ptr addrspace(1) %out, <3 x ptr> %src0) {
 ; GFX11-SDAG-LABEL: test_v3p0:
 ; GFX11-SDAG:       ; %bb.0:
 ; GFX11-SDAG-NEXT:    s_clause 0x2
-; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x44
 ; GFX11-SDAG-NEXT:    s_load_b64 s[6:7], s[4:5], 0x54
+; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x44
 ; GFX11-SDAG-NEXT:    s_load_b64 s[4:5], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v6, 0 :: v_dual_mov_b32 v1, s2
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v5, s7
-; GFX11-SDAG-NEXT:    v_mov_b32_e32 v8, s6
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, s3 :: v_dual_mov_b32 v7, s0
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v2, v1
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v4
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v5, v5
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v4, v8
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v3, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v7
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v6, 0 :: v_dual_mov_b32 v5, s7
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v4, s6 :: v_dual_mov_b32 v1, s1
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v3, s3
+; GFX11-SDAG-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX11-SDAG-NEXT:    s_clause 0x1
 ; GFX11-SDAG-NEXT:    global_store_b64 v6, v[4:5], s[4:5] offset:16
 ; GFX11-SDAG-NEXT:    global_store_b128 v6, v[0:3], s[4:5]
@@ -53,10 +43,8 @@ define amdgpu_kernel void @test_p3(ptr addrspace(1) %out, ptr addrspace(3) %src0
 ; GFX11-SDAG-NEXT:    s_load_b32 s2, s[4:5], 0x2c
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, s2
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v0
-; GFX11-SDAG-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; GFX11-SDAG-NEXT:    global_store_b32 v0, v1, s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
   %v = call ptr addrspace(3) @llvm.amdgcn.permlane64.v3p0(ptr addrspace(3) %src0)
   store ptr addrspace(3) %v, ptr addrspace(1) %out
@@ -70,14 +58,9 @@ define amdgpu_kernel void @test_v3p3(ptr addrspace(1) %out, <3 x ptr addrspace(3
 ; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x34
 ; GFX11-SDAG-NEXT:    s_load_b64 s[4:5], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v3, s0
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s1
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v2, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v1
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_4)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v3
-; GFX11-SDAG-NEXT:    global_store_b96 v4, v[0:2], s[4:5]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v3, 0 :: v_dual_mov_b32 v0, s0
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, s1 :: v_dual_mov_b32 v2, s2
+; GFX11-SDAG-NEXT:    global_store_b96 v3, v[0:2], s[4:5]
 ; GFX11-SDAG-NEXT:    s_endpgm
   %v = call <3 x ptr addrspace(3)> @llvm.amdgcn.permlane64.v3p3(<3 x ptr addrspace(3)> %src0)
   store <3 x ptr addrspace(3)> %v, ptr addrspace(1) %out
@@ -91,10 +74,8 @@ define amdgpu_kernel void @test_p5(ptr addrspace(1) %out, ptr addrspace(5) %src0
 ; GFX11-SDAG-NEXT:    s_load_b32 s2, s[4:5], 0x2c
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, s2
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v0
-; GFX11-SDAG-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; GFX11-SDAG-NEXT:    global_store_b32 v0, v1, s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
   %v = call ptr addrspace(5) @llvm.amdgcn.permlane64.p5(ptr addrspace(5) %src0)
   store ptr addrspace(5) %v, ptr addrspace(1) %out
@@ -108,14 +89,9 @@ define amdgpu_kernel void @test_v3p5(ptr addrspace(1) %out, <3 x ptr addrspace(5
 ; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x34
 ; GFX11-SDAG-NEXT:    s_load_b64 s[4:5], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v3, s0
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s1
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v2, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v1
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_4)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v3
-; GFX11-SDAG-NEXT:    global_store_b96 v4, v[0:2], s[4:5]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v3, 0 :: v_dual_mov_b32 v0, s0
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, s1 :: v_dual_mov_b32 v2, s2
+; GFX11-SDAG-NEXT:    global_store_b96 v3, v[0:2], s[4:5]
 ; GFX11-SDAG-NEXT:    s_endpgm
   %v = call <3 x ptr addrspace(5)> @llvm.amdgcn.permlane64.v3p5(<3 x ptr addrspace(5)> %src0)
   store <3 x ptr addrspace(5)> %v, ptr addrspace(1) %out
@@ -129,10 +105,8 @@ define amdgpu_kernel void @test_p6(ptr addrspace(1) %out, ptr addrspace(6) %src0
 ; GFX11-SDAG-NEXT:    s_load_b32 s2, s[4:5], 0x2c
 ; GFX11-SDAG-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v0, s2
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v0
-; GFX11-SDAG-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; GFX11-SDAG-NEXT:    global_store_b32 v0, v1, s[0:1]
 ; GFX11-SDAG-NEXT:    s_endpgm
   %v = call ptr addrspace(6) @llvm.amdgcn.permlane64.p6(ptr addrspace(6) %src0)
   store ptr addrspace(6) %v, ptr addrspace(1) %out
@@ -146,14 +120,9 @@ define amdgpu_kernel void @test_v3p6(ptr addrspace(1) %out, <3 x ptr addrspace(6
 ; GFX11-SDAG-NEXT:    s_load_b128 s[0:3], s[4:5], 0x34
 ; GFX11-SDAG-NEXT:    s_load_b64 s[4:5], s[4:5], 0x24
 ; GFX11-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v3, s0
-; GFX11-SDAG-NEXT:    v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s1
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v2, v0
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v1, v1
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_4)
-; GFX11-SDAG-NEXT:    v_permlane64_b32 v0, v3
-; GFX11-SDAG-NEXT:    global_store_b96 v4, v[0:2], s[4:5]
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v3, 0 :: v_dual_mov_b32 v0, s0
+; GFX11-SDAG-NEXT:    v_dual_mov_b32 v1, s1 :: v_dual_mov_b32 v2, s2
+; GFX11-SDAG-NEXT:    global_store_b96 v3, v[0:2], s[4:5]
 ; GFX11-SDAG-NEXT:    s_endpgm
   %v = call <3 x ptr addrspace(6)> @llvm.amdgcn.permlane64.v3p6(<3 x ptr addrspace(6)> %src0)
   store <3 x ptr addrspace(6)> %v, ptr addrspace(1) %out
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll
index 2f4d5ee3cbce5..c0b5195f699b9 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll
@@ -396,8 +396,7 @@ define amdgpu_kernel void @test_readfirstlane_imm_f64(ptr addrspace(1) %out) {
 ;
 ; CHECK-GISEL-LABEL: test_readfirstlane_imm_f64:
 ; CHECK-GISEL:       ; %bb.0:
-; CHECK-GISEL-NEXT:    s_mov_b32 s0, 0
-; CHECK-GISEL-NEXT:    s_mov_b32 s1, 0x40400000
+; CHECK-GISEL-NEXT:    s_mov_b64 s[0:1], 0x4040000000000000
 ; CHECK-GISEL-NEXT:    ;;#ASMSTART
 ; CHECK-GISEL-NEXT:    ; use s[0:1]
 ; CHECK-GISEL-NEXT:    ;;#ASMEND
@@ -456,14 +455,13 @@ define amdgpu_kernel void @test_readfirstlane_imm_fold_i64(ptr addrspace(1) %out
 ; CHECK-GISEL-LABEL: test_readfirstlane_imm_fold_i64:
 ; CHECK-GISEL:       ; %bb.0:
 ; CHECK-GISEL-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
-; CHECK-GISEL-NEXT:    s_mov_b64 s[2:3], 32
 ; CHECK-GISEL-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, s2
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, 32
 ; CHECK-GISEL-NEXT:    s_mov_b32 flat_scratch_lo, s13
+; CHECK-GISEL-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
 ; CHECK-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-GISEL-NEXT:    v_mov_b32_e32 v3, s1
-; CHECK-GISEL-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, s3
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, 0
 ; CHECK-GISEL-NEXT:    v_mov_b32_e32 v2, s0
 ; CHECK-GISEL-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
 ; CHECK-GISEL-NEXT:    s_endpgm
@@ -490,15 +488,13 @@ define amdgpu_kernel void @test_readfirstlane_imm_fold_f64(ptr addrspace(1) %out
 ; CHECK-GISEL-LABEL: test_readfirstlane_imm_fold_f64:
 ; CHECK-GISEL:       ; %bb.0:
 ; CHECK-GISEL-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
-; CHECK-GISEL-NEXT:    s_mov_b32 s2, 0
 ; CHECK-GISEL-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-GISEL-NEXT:    s_mov_b32 s3, 0x40400000
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, s2
-; CHECK-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v3, s1
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, 0
 ; CHECK-GISEL-NEXT:    s_mov_b32 flat_scratch_lo, s13
 ; CHECK-GISEL-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, s3
+; CHECK-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v3, s1
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, 0x40400000
 ; CHECK-GISEL-NEXT:    v_mov_b32_e32 v2, s0
 ; CHECK-GISEL-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
 ; CHECK-GISEL-NEXT:    s_endpgm
@@ -588,17 +584,17 @@ define amdgpu_kernel void @test_readfirstlane_copy_from_sgpr_i64(ptr addrspace(1
 ; CHECK-SDAG:       ; %bb.0:
 ; CHECK-SDAG-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
 ; CHECK-SDAG-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
-; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
 ; CHECK-SDAG-NEXT:    ;;#ASMSTART
 ; CHECK-SDAG-NEXT:    s_mov_b64 s[2:3], 0
 ; CHECK-SDAG-NEXT:    ;;#ASMEND
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s2
+; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
 ; CHECK-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s1
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s2
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s3
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s0
-; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s0
+; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s1
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s3
+; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CHECK-SDAG-NEXT:    s_endpgm
 ;
 ; CHECK-GISEL-LABEL: test_readfirstlane_copy_from_sgpr_i64:
@@ -628,17 +624,17 @@ define amdgpu_kernel void @test_readfirstlane_copy_from_sgpr_f64(ptr addrspace(1
 ; CHECK-SDAG:       ; %bb.0:
 ; CHECK-SDAG-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
 ; CHECK-SDAG-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
-; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
 ; CHECK-SDAG-NEXT:    ;;#ASMSTART
 ; CHECK-SDAG-NEXT:    s_mov_b64 s[2:3], 0
 ; CHECK-SDAG-NEXT:    ;;#ASMEND
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s2
+; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
 ; CHECK-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s1
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s2
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s3
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s0
-; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s0
+; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s1
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s3
+; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CHECK-SDAG-NEXT:    s_endpgm
 ;
 ; CHECK-GISEL-LABEL: test_readfirstlane_copy_from_sgpr_f64:
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
index 7ff5eb46def38..0795f4050b622 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
@@ -9,7 +9,7 @@ declare double @llvm.amdgcn.readlane.f64(double, i32) #0
 define amdgpu_kernel void @test_readlane_sreg_sreg_i32(i32 %src0, i32 %src1) #1 {
 ; CHECK-SDAG-LABEL: test_readlane_sreg_sreg_i32:
 ; CHECK-SDAG:       ; %bb.0:
-; CHECK-SDAG-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
+; CHECK-SDAG-NEXT:    s_load_dword s0, s[8:9], 0x0
 ; CHECK-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-SDAG-NEXT:    ;;#ASMSTART
 ; CHECK-SDAG-NEXT:    ; use s0
@@ -18,7 +18,7 @@ define amdgpu_kernel void @test_readlane_sreg_sreg_i32(i32 %src0, i32 %src1) #1
 ;
 ; CHECK-GISEL-LABEL: test_readlane_sreg_sreg_i32:
 ; CHECK-GISEL:       ; %bb.0:
-; CHECK-GISEL-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
+; CHECK-GISEL-NEXT:    s_load_dword s0, s[8:9], 0x0
 ; CHECK-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-GISEL-NEXT:    ;;#ASMSTART
 ; CHECK-GISEL-NEXT:    ; use s0
@@ -224,14 +224,13 @@ define amdgpu_kernel void @test_readlane_imm_sreg_i64(ptr addrspace(1) %out, i32
 ; CHECK-GISEL-LABEL: test_readlane_imm_sreg_i64:
 ; CHECK-GISEL:       ; %bb.0:
 ; CHECK-GISEL-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
-; CHECK-GISEL-NEXT:    s_mov_b64 s[2:3], 32
 ; CHECK-GISEL-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, s2
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, 32
 ; CHECK-GISEL-NEXT:    s_mov_b32 flat_scratch_lo, s13
+; CHECK-GISEL-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
 ; CHECK-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-GISEL-NEXT:    v_mov_b32_e32 v3, s1
-; CHECK-GISEL-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, s3
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, 0
 ; CHECK-GISEL-NEXT:    v_mov_b32_e32 v2, s0
 ; CHECK-GISEL-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
 ; CHECK-GISEL-NEXT:    s_endpgm
@@ -258,15 +257,13 @@ define amdgpu_kernel void @test_readlane_imm_sreg_f64(ptr addrspace(1) %out, i32
 ; CHECK-GISEL-LABEL: test_readlane_imm_sreg_f64:
 ; CHECK-GISEL:       ; %bb.0:
 ; CHECK-GISEL-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
-; CHECK-GISEL-NEXT:    s_mov_b32 s2, 0
 ; CHECK-GISEL-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-GISEL-NEXT:    s_mov_b32 s3, 0x40400000
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, s2
-; CHECK-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v3, s1
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v0, 0
 ; CHECK-GISEL-NEXT:    s_mov_b32 flat_scratch_lo, s13
 ; CHECK-GISEL-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
-; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, s3
+; CHECK-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v3, s1
+; CHECK-GISEL-NEXT:    v_mov_b32_e32 v1, 0x40400000
 ; CHECK-GISEL-NEXT:    v_mov_b32_e32 v2, s0
 ; CHECK-GISEL-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
 ; CHECK-GISEL-NEXT:    s_endpgm
@@ -660,17 +657,17 @@ define amdgpu_kernel void @test_readlane_copy_from_sgpr_i64(ptr addrspace(1) %ou
 ; CHECK-SDAG:       ; %bb.0:
 ; CHECK-SDAG-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
 ; CHECK-SDAG-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
-; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
 ; CHECK-SDAG-NEXT:    ;;#ASMSTART
 ; CHECK-SDAG-NEXT:    s_mov_b64 s[2:3], 0
 ; CHECK-SDAG-NEXT:    ;;#ASMEND
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s2
+; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
 ; CHECK-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s1
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s2
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s3
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s0
-; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s0
+; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s1
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s3
+; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CHECK-SDAG-NEXT:    s_endpgm
 ;
 ; CHECK-GISEL-LABEL: test_readlane_copy_from_sgpr_i64:
@@ -700,17 +697,17 @@ define amdgpu_kernel void @test_readlane_copy_from_sgpr_f64(ptr addrspace(1) %ou
 ; CHECK-SDAG:       ; %bb.0:
 ; CHECK-SDAG-NEXT:    s_load_dwordx2 s[0:1], s[8:9], 0x0
 ; CHECK-SDAG-NEXT:    s_add_i32 s12, s12, s17
-; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
-; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
 ; CHECK-SDAG-NEXT:    ;;#ASMSTART
 ; CHECK-SDAG-NEXT:    s_mov_b64 s[2:3], 0
 ; CHECK-SDAG-NEXT:    ;;#ASMEND
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s2
+; CHECK-SDAG-NEXT:    s_mov_b32 flat_scratch_lo, s13
 ; CHECK-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s1
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s2
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s3
-; CHECK-SDAG-NEXT:    v_mov_b32_e32 v2, s0
-; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[2:3], v[0:1]
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v0, s0
+; CHECK-SDAG-NEXT:    s_lshr_b32 flat_scratch_hi, s12, 8
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v1, s1
+; CHECK-SDAG-NEXT:    v_mov_b32_e32 v3, s3
+; CHECK-SDAG-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CHECK-SDAG-NEXT:    s_endpgm
 ;
 ; CHECK-GISEL-LABEL: test_readlane_copy_from_sgpr_f64:
diff --git a/llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll b/llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll
index 586579fcaeb93..ef96944abef0e 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll
+++ b/llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll
@@ -20,38 +20,33 @@ define void @test() {
 ; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
 ; CHECK-NEXT:  .LBB0_3: ; %bb.3
 ; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
-; CHECK-NEXT:    ; implicit-def: $sgpr4
-; CHECK-NEXT:    v_mov_b32_e32 v0, s4
-; CHECK-NEXT:    v_readfirstlane_b32 s6, v0
 ; CHECK-NEXT:    s_mov_b64 s[4:5], -1
-; CHECK-NEXT:    s_mov_b32 s7, 0
-; CHECK-NEXT:    s_cmp_eq_u32 s6, s7
 ; CHECK-NEXT:    ; implicit-def: $vgpr1 : SGPR spill to VGPR lane
 ; CHECK-NEXT:    v_writelane_b32 v1, s4, 0
 ; CHECK-NEXT:    v_writelane_b32 v1, s5, 1
-; CHECK-NEXT:    s_mov_b64 s[10:11], exec
-; CHECK-NEXT:    s_mov_b64 exec, -1
+; CHECK-NEXT:    s_or_saveexec_b64 s[8:9], -1
+; CHECK-NEXT:    s_nop 0
 ; CHECK-NEXT:    v_accvgpr_write_b32 a0, v1 ; Reload Reuse
-; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    s_mov_b64 exec, s[8:9]
 ; CHECK-NEXT:    s_cbranch_scc1 .LBB0_5
 ; CHECK-NEXT:  ; %bb.4: ; %bb.4
 ; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
-; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    s_or_saveexec_b64 s[8:9], -1
 ; CHECK-NEXT:    v_accvgpr_read_b32 v1, a0 ; Reload Reuse
-; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    s_mov_b64 exec, s[8:9]
 ; CHECK-NEXT:    s_mov_b64 s[4:5], 0
 ; CHECK-NEXT:    v_writelane_b32 v1, s4, 0
 ; CHECK-NEXT:    v_writelane_b32 v1, s5, 1
-; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    s_or_saveexec_b64 s[8:9], -1
 ; CHECK-NEXT:    s_nop 0
 ; CHECK-NEXT:    v_accvgpr_write_b32 a0, v1 ; Reload Reuse
-; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    s_mov_b64 exec, s[8:9]
 ; CHECK-NEXT:  .LBB0_5: ; %Flow
 ; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
-; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    s_or_saveexec_b64 s[8:9], -1
 ; CHECK-NEXT:    s_nop 0
 ; CHECK-NEXT:    v_accvgpr_read_b32 v1, a0 ; Reload Reuse
-; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    s_mov_b64 exec, s[8:9]
 ; CHECK-NEXT:    v_readlane_b32 s4, v1, 0
 ; CHECK-NEXT:    v_readlane_b32 s5, v1, 1
 ; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[4:5]
diff --git a/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll b/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll
index 5aafb0f576fb4..364598f7cf6c0 100644
--- a/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll
+++ b/llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll
@@ -31,8 +31,8 @@ define amdgpu_gs void @_amdgpu_gs_main(i32 inreg %primShaderTableAddrLow, <31 x
   ; CHECK-NEXT:   [[COPY13:%[0-9]+]]:sgpr_32 = COPY $sgpr10
   ; CHECK-NEXT:   [[COPY14:%[0-9]+]]:sgpr_32 = COPY $sgpr8
   ; CHECK-NEXT:   undef [[S_LOAD_DWORDX2_IMM:%[0-9]+]].sub0_sub1:sgpr_128 = S_LOAD_DWORDX2_IMM [[COPY]], 232, 0 :: (invariant load (s64) from %ir.39, addrspace 4)
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM undef %125:sgpr_128, 0, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   KILL undef %125:sgpr_128
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM undef %117:sgpr_128, 0, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   KILL undef %117:sgpr_128
   ; CHECK-NEXT:   [[S_LSHL_B32_:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY5]], 4, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_LSHL_B32_1:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY4]], 4, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_LSHL_B32_2:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY3]], 4, implicit-def dead $scc
@@ -44,87 +44,85 @@ define amdgpu_gs void @_amdgpu_gs_main(i32 inreg %primShaderTableAddrLow, <31 x
   ; CHECK-NEXT:   [[S_SUB_I32_1:%[0-9]+]]:sreg_32 = S_SUB_I32 [[S_BUFFER_LOAD_DWORD_IMM]], 30, implicit-def dead $scc
   ; CHECK-NEXT:   undef [[S_ADD_U32_:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_2]], implicit-def $scc
   ; CHECK-NEXT:   [[S_ADD_U32_:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_]], 16, 0 :: (invariant load (s128) from %ir.81, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_]], 16, 0 :: (invariant load (s128) from %ir.71, addrspace 4)
   ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM1:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM undef %74:sreg_64, 0, 0 :: (invariant load (s128) from `ptr addrspace(4) poison`, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM2:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_]], 64, 0 :: (invariant load (s128) from %ir.88, addrspace 4)
   ; CHECK-NEXT:   KILL undef %74:sreg_64
   ; CHECK-NEXT:   KILL [[S_ADD_U32_]].sub0, [[S_ADD_U32_]].sub1
   ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM [[S_LOAD_DWORDX4_IMM]], 0, 0 :: (dereferenceable invariant load (s32))
   ; CHECK-NEXT:   [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
   ; CHECK-NEXT:   undef [[S_MOV_B32_:%[0-9]+]].sub1:sgpr_128 = S_MOV_B32 0
-  ; CHECK-NEXT:   [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET undef %118:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], undef %89:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+  ; CHECK-NEXT:   [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET undef %112:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], undef %87:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN1:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   KILL undef %89:sgpr_128
-  ; CHECK-NEXT:   KILL undef %118:sgpr_128
+  ; CHECK-NEXT:   KILL undef %112:sgpr_128
+  ; CHECK-NEXT:   KILL undef %87:sgpr_128
   ; CHECK-NEXT:   [[S_SUB_I32_2:%[0-9]+]]:sreg_32 = S_SUB_I32 [[S_BUFFER_LOAD_DWORD_IMM1]], 31, implicit-def dead $scc
   ; CHECK-NEXT:   undef [[S_ADD_U32_1:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_]], implicit-def $scc
   ; CHECK-NEXT:   [[S_ADD_U32_1:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
   ; CHECK-NEXT:   undef [[S_ADD_U32_2:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_1]], implicit-def $scc
   ; CHECK-NEXT:   [[S_ADD_U32_2:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_3:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM2:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_1]], 64, 0 :: (invariant load (s128) from %ir.87, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM3:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_2]], 64, 0 :: (invariant load (s128) from %ir.93, addrspace 4)
-  ; CHECK-NEXT:   KILL [[S_ADD_U32_1]].sub0, [[S_ADD_U32_1]].sub1
+  ; CHECK-NEXT:   [[S_ASHR_I32_3:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 undef %148:sreg_32, 31, implicit-def dead $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_3:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], undef %148:sreg_32, implicit-def $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM3:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_1]], 64, 0 :: (invariant load (s128) from %ir.77, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM4:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_2]], 64, 0 :: (invariant load (s128) from %ir.83, addrspace 4)
   ; CHECK-NEXT:   KILL [[S_ADD_U32_2]].sub0, [[S_ADD_U32_2]].sub1
-  ; CHECK-NEXT:   [[S_ADD_U32_3:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_ASHR_I32_3:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 undef %169:sreg_32, 31, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_4:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY6]], undef %169:sreg_32, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_4:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_5:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_5:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_6:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_6:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_7:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, undef %169:sreg_32, implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_7:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_8:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_8:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_9:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY8]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_9:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %48:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_10:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY9]], [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_10:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %45:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_11:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY9]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_11:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %45:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   KILL [[S_ADD_U32_1]].sub0, [[S_ADD_U32_1]].sub1
+  ; CHECK-NEXT:   [[S_ADD_U32_3:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %54:sreg_32, [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_4:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_4:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_5:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_1]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_5:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_6:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, undef %148:sreg_32, implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_6:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_3]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_7:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY7]].sub0, [[S_LSHL_B32_2]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_7:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %51:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_8:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY8]], [[S_LSHL_B32_]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_8:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %48:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_9:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY9]], [[S_LSHL_B32_1]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_9:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %45:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_10:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY9]], [[S_LSHL_B32_2]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_10:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %45:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
   ; CHECK-NEXT:   [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_LSHL_B32_]], 16, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_1:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_LSHL_B32_2]], 16, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM [[S_MOV_B32_]], [[S_ADD_I32_]], 0, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM [[S_MOV_B32_]], undef %302:sreg_32, 0, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM [[S_MOV_B32_]], undef %279:sreg_32, 0, 0 :: (dereferenceable invariant load (s32))
   ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM2:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM [[S_MOV_B32_]], [[S_ADD_I32_1]], 0, 0 :: (dereferenceable invariant load (s32))
   ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM2:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM [[S_MOV_B32_]], 16, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM3:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %357:sgpr_128, undef %358:sreg_32, 0, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM3:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM undef %368:sgpr_128, 16, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM4:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_3]], 64, 0 :: (invariant load (s128) from %ir.99, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM5:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_4]], 64, 0 :: (invariant load (s128) from %ir.107, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM6:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_5]], 0, 0 :: (invariant load (s128) from %ir.112, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM7:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_6]], 0, 0 :: (invariant load (s128) from %ir.117, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM8:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_7]], 0, 0 :: (invariant load (s128) from %ir.124, addrspace 4)
-  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN2:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM2]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM4:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %352:sgpr_128, [[S_ADD_I32_]], 0, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM5:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %363:sgpr_128, [[S_ADD_I32_1]], 0, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN3:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM3]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM3:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %334:sgpr_128, undef %335:sreg_32, 0, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM3:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM undef %345:sgpr_128, 16, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM5:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_3]], 64, 0 :: (invariant load (s128) from %ir.95, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM6:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_4]], 0, 0 :: (invariant load (s128) from %ir.100, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM7:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_5]], 0, 0 :: (invariant load (s128) from %ir.105, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM8:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_6]], 0, 0 :: (invariant load (s128) from %ir.112, addrspace 4)
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM4:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %329:sgpr_128, [[S_ADD_I32_]], 0, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM5:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %340:sgpr_128, [[S_ADD_I32_1]], 0, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN2:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM3]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN3:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM4]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_ADD_I32_2:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM]], -98, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_3:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM1]], -114, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_4:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM2]], -130, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_5:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_IMM2]], -178, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_12:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY10]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_12:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %42:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_13:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY11]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_13:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %39:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_14:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY11]], [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_14:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %39:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_15:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY11]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_15:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %39:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_11:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY10]], [[S_LSHL_B32_]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_11:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %42:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_12:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY11]], [[S_LSHL_B32_]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_12:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %39:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_13:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY11]], [[S_LSHL_B32_1]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_13:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %39:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_14:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY11]], [[S_LSHL_B32_2]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_14:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %39:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
   ; CHECK-NEXT:   [[S_LSHL_B32_3:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY12]], 4, implicit-def dead $scc
-  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN4:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM4]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+  ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN4:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM2]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_ADD_I32_6:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_LSHL_B32_3]], 16, implicit-def dead $scc
-  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM6:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %384:sgpr_128, [[S_ADD_I32_6]], 0, 0 :: (dereferenceable invariant load (s32))
+  ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_SGPR_IMM6:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR_IMM undef %361:sgpr_128, [[S_ADD_I32_6]], 0, 0 :: (dereferenceable invariant load (s32))
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN5:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM5]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM9:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_5]], 224, 0 :: (invariant load (s128) from %ir.129, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM10:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY7]], 224, 0 :: (invariant load (s128) from %ir.145, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM11:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_5]], 576, 0 :: (invariant load (s128) from %ir.150, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM9:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_4]], 224, 0 :: (invariant load (s128) from %ir.117, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM10:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY7]], 224, 0 :: (invariant load (s128) from %ir.133, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM11:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_4]], 576, 0 :: (invariant load (s128) from %ir.138, addrspace 4)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN6:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM6]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM12:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_6]], 224, 0 :: (invariant load (s128) from %ir.134, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM13:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_7]], 576, 0 :: (invariant load (s128) from %ir.162, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM14:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_8]], 224, 0 :: (invariant load (s128) from %ir.140, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM12:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_5]], 224, 0 :: (invariant load (s128) from %ir.122, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM13:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_6]], 576, 0 :: (invariant load (s128) from %ir.150, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM14:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_7]], 224, 0 :: (invariant load (s128) from %ir.128, addrspace 4)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN7:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM7]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN8:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM8]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_ADD_I32_7:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM4]], -217, implicit-def dead $scc
@@ -135,49 +133,49 @@ define amdgpu_gs void @_amdgpu_gs_main(i32 inreg %primShaderTableAddrLow, <31 x
   ; CHECK-NEXT:   [[S_ADD_I32_12:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM3]], -329, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_13:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM3]], -345, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_14:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM6]], -441, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_16:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY2]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_16:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %36:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_15:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY2]], [[S_LSHL_B32_2]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_15:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %36:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
   ; CHECK-NEXT:   [[S_LSHL_B32_4:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY13]], 4, implicit-def dead $scc
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN9:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM9]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_ASHR_I32_4:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_4]], 31, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_17:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY2]], [[S_LSHL_B32_4]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_17:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %36:sreg_32, [[S_ASHR_I32_4]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   undef [[S_ADD_U32_16:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY2]], [[S_LSHL_B32_4]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_16:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %36:sreg_32, [[S_ASHR_I32_4]], implicit-def dead $scc, implicit $scc
   ; CHECK-NEXT:   [[S_LSHL_B32_5:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY5]], 3, implicit-def dead $scc
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN10:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM12]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_ASHR_I32_5:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_5]], 31, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_18:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_5]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_18:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_5]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[S_ADD_U32_18]], 168, 0 :: (invariant load (s32) from %ir.273, align 8, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM15:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_8]], 576, 0 :: (invariant load (s128) from %ir.157, addrspace 4)
+  ; CHECK-NEXT:   undef [[S_ADD_U32_17:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_5]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_17:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_5]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[S_ADD_U32_17]], 168, 0 :: (invariant load (s32) from %ir.260, align 8, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM15:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_7]], 576, 0 :: (invariant load (s128) from %ir.145, addrspace 4)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN11:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM14]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN12:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM10]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN13:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM11]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM:%[0-9]+]].sub3:sgpr_128 = S_MOV_B32 553734060
   ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM:%[0-9]+]].sub2:sgpr_128 = S_MOV_B32 -1
   ; CHECK-NEXT:   [[COPY15:%[0-9]+]]:sgpr_128 = COPY [[S_LOAD_DWORDX2_IMM]]
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM16:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_9]], 0, 0 :: (invariant load (s128) from %ir.170, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM16:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_8]], 0, 0 :: (invariant load (s128) from %ir.158, addrspace 4)
   ; CHECK-NEXT:   [[COPY15:%[0-9]+]].sub1:sgpr_128 = COPY [[S_MOV_B32_]].sub1
   ; CHECK-NEXT:   [[COPY15:%[0-9]+]].sub0:sgpr_128 = COPY [[S_LOAD_DWORD_IMM]]
   ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM4:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM [[COPY15]], 0, 0 :: (dereferenceable invariant load (s32))
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN14:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM15]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN15:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM13]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM17:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_10]], 0, 0 :: (invariant load (s128) from %ir.178, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM18:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_11]], 0, 0 :: (invariant load (s128) from %ir.183, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM17:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_9]], 0, 0 :: (invariant load (s128) from %ir.166, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM18:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_10]], 0, 0 :: (invariant load (s128) from %ir.171, addrspace 4)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN16:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM16]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_LSHL_B32_6:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY4]], 3, implicit-def dead $scc
   ; CHECK-NEXT:   [[BUFFER_LOAD_DWORD_OFFSET1:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET [[S_LOAD_DWORDX4_IMM1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_ASHR_I32_6:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_6]], 31, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_15:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_IMM4]], -467, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_19:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_6]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_19:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_6]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM1:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[S_ADD_U32_19]], 168, 0 :: (invariant load (s64) from %ir.282, addrspace 4)
+  ; CHECK-NEXT:   undef [[S_ADD_U32_18:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_6]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_18:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_6]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM1:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[S_ADD_U32_18]], 168, 0 :: (invariant load (s64) from %ir.269, addrspace 4)
   ; CHECK-NEXT:   [[BUFFER_LOAD_DWORD_OFFSET2:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET [[S_LOAD_DWORDX4_IMM17]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_DWORD_OFFSET3:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET [[S_LOAD_DWORDX4_IMM18]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM19:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_12]], 0, 0 :: (invariant load (s128) from %ir.205, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM20:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_13]], 0, 0 :: (invariant load (s128) from %ir.211, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM19:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_11]], 0, 0 :: (invariant load (s128) from %ir.193, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM20:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_12]], 0, 0 :: (invariant load (s128) from %ir.199, addrspace 4)
   ; CHECK-NEXT:   [[COPY16:%[0-9]+]]:sgpr_128 = COPY [[S_LOAD_DWORDX2_IMM]]
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM21:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_14]], 0, 0 :: (invariant load (s128) from %ir.216, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM22:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_15]], 0, 0 :: (invariant load (s128) from %ir.221, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM21:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_13]], 0, 0 :: (invariant load (s128) from %ir.204, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM22:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_14]], 0, 0 :: (invariant load (s128) from %ir.209, addrspace 4)
   ; CHECK-NEXT:   [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_LOAD_DWORDX2_IMM1]].sub1, 65535, implicit-def dead $scc
   ; CHECK-NEXT:   [[COPY16:%[0-9]+]].sub0:sgpr_128 = COPY [[S_LOAD_DWORDX2_IMM1]].sub0
   ; CHECK-NEXT:   [[COPY16:%[0-9]+]].sub1:sgpr_128 = COPY [[S_AND_B32_]]
@@ -189,30 +187,30 @@ define amdgpu_gs void @_amdgpu_gs_main(i32 inreg %primShaderTableAddrLow, <31 x
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN20:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM22]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[S_ASHR_I32_7:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_7]], 31, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_16:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_IMM5]], -468, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_20:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_7]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_20:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_7]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM2:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[S_ADD_U32_20]], 168, 0 :: (invariant load (s64) from %ir.293, addrspace 4)
+  ; CHECK-NEXT:   undef [[S_ADD_U32_19:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_7]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_19:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_7]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM2:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[S_ADD_U32_19]], 168, 0 :: (invariant load (s64) from %ir.280, addrspace 4)
   ; CHECK-NEXT:   [[COPY17:%[0-9]+]]:sgpr_128 = COPY [[S_LOAD_DWORDX2_IMM]]
   ; CHECK-NEXT:   [[S_AND_B32_1:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_LOAD_DWORDX2_IMM2]].sub1, 65535, implicit-def dead $scc
   ; CHECK-NEXT:   [[COPY17:%[0-9]+]].sub0:sgpr_128 = COPY [[S_LOAD_DWORDX2_IMM2]].sub0
   ; CHECK-NEXT:   [[COPY17:%[0-9]+]].sub1:sgpr_128 = COPY [[S_AND_B32_1]]
   ; CHECK-NEXT:   [[S_BUFFER_LOAD_DWORD_IMM6:%[0-9]+]]:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM [[COPY17]], 0, 0 :: (dereferenceable invariant load (s32))
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM23:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_16]], 160, 0 :: (invariant load (s128) from %ir.256, addrspace 4)
-  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef %470:sreg_64, 0, 0 :: (invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
-  ; CHECK-NEXT:   KILL [[S_ADD_U32_16]].sub0, [[S_ADD_U32_16]].sub1
-  ; CHECK-NEXT:   KILL undef %470:sreg_64
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM23:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_15]], 160, 0 :: (invariant load (s128) from %ir.244, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef %443:sreg_64, 0, 0 :: (invariant load (s32) from `ptr addrspace(4) poison`, addrspace 4)
+  ; CHECK-NEXT:   KILL [[S_ADD_U32_15]].sub0, [[S_ADD_U32_15]].sub1
   ; CHECK-NEXT:   KILL [[COPY17]].sub0_sub1_sub2, [[COPY17]].sub3
+  ; CHECK-NEXT:   KILL undef %443:sreg_64
   ; CHECK-NEXT:   [[S_LSHL_B32_8:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY14]], 3, implicit-def dead $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM24:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_17]], 160, 0 :: (invariant load (s128) from %ir.265, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM24:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_16]], 160, 0 :: (invariant load (s128) from %ir.252, addrspace 4)
   ; CHECK-NEXT:   [[S_ASHR_I32_8:%[0-9]+]]:sreg_32_xm0 = S_ASHR_I32 [[S_LSHL_B32_8]], 31, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_17:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_IMM6]], -469, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_21:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_8]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_21:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_8]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM2:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[S_ADD_U32_21]], 168, 0 :: (invariant load (s32) from %ir.305, align 8, addrspace 4)
+  ; CHECK-NEXT:   undef [[S_ADD_U32_20:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY]].sub0, [[S_LSHL_B32_8]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_20:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %57:sreg_32, [[S_ASHR_I32_8]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM2:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[S_ADD_U32_20]], 168, 0 :: (invariant load (s32) from %ir.291, align 8, addrspace 4)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN21:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM23]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN22:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM24]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   KILL [[S_LOAD_DWORDX4_IMM24]]
   ; CHECK-NEXT:   KILL [[S_LOAD_DWORDX4_IMM23]]
+  ; CHECK-NEXT:   KILL [[S_LOAD_DWORDX4_IMM24]]
   ; CHECK-NEXT:   [[S_AND_B32_2:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_LOAD_DWORD_IMM1]], 65535, implicit-def dead $scc
   ; CHECK-NEXT:   [[COPY18:%[0-9]+]]:sgpr_128 = COPY [[S_LOAD_DWORDX2_IMM]]
   ; CHECK-NEXT:   [[COPY18:%[0-9]+]].sub1:sgpr_128 = COPY [[S_AND_B32_2]]
@@ -224,22 +222,22 @@ define amdgpu_gs void @_amdgpu_gs_main(i32 inreg %primShaderTableAddrLow, <31 x
   ; CHECK-NEXT:   [[S_ADD_I32_21:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM3]], -507, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_22:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_SGPR_IMM3]], -539, implicit-def dead $scc
   ; CHECK-NEXT:   [[S_ADD_I32_23:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_IMM7]], -473, implicit-def dead $scc
-  ; CHECK-NEXT:   undef [[S_ADD_U32_22:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY1]], [[S_LSHL_B32_]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_22:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %33:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM25:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_22]], 96, 0 :: (invariant load (s128) from %ir.323, addrspace 4)
-  ; CHECK-NEXT:   undef [[S_ADD_U32_23:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY1]], [[S_LSHL_B32_1]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_23:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %33:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM26:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_23]], 96, 0 :: (invariant load (s128) from %ir.329, addrspace 4)
-  ; CHECK-NEXT:   undef [[S_ADD_U32_24:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY1]], [[S_LSHL_B32_2]], implicit-def $scc
-  ; CHECK-NEXT:   [[S_ADD_U32_24:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %33:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM27:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_24]], 96, 0 :: (invariant load (s128) from %ir.335, addrspace 4)
+  ; CHECK-NEXT:   undef [[S_ADD_U32_21:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY1]], [[S_LSHL_B32_]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_21:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %33:sreg_32, [[S_ASHR_I32_]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM25:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_21]], 96, 0 :: (invariant load (s128) from %ir.309, addrspace 4)
+  ; CHECK-NEXT:   undef [[S_ADD_U32_22:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY1]], [[S_LSHL_B32_1]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_22:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %33:sreg_32, [[S_ASHR_I32_1]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM26:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_22]], 96, 0 :: (invariant load (s128) from %ir.315, addrspace 4)
+  ; CHECK-NEXT:   undef [[S_ADD_U32_23:%[0-9]+]].sub0:sreg_64 = S_ADD_U32 [[COPY1]], [[S_LSHL_B32_2]], implicit-def $scc
+  ; CHECK-NEXT:   [[S_ADD_U32_23:%[0-9]+]].sub1:sreg_64 = S_ADDC_U32 undef %33:sreg_32, [[S_ASHR_I32_2]], implicit-def dead $scc, implicit $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX4_IMM27:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[S_ADD_U32_23]], 96, 0 :: (invariant load (s128) from %ir.321, addrspace 4)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN23:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM25]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN24:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM26]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
   ; CHECK-NEXT:   [[BUFFER_LOAD_FORMAT_X_IDXEN25:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_FORMAT_X_IDXEN [[V_MOV_B32_e32_]], [[S_LOAD_DWORDX4_IMM27]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
-  ; CHECK-NEXT:   KILL [[S_LOAD_DWORDX4_IMM27]]
   ; CHECK-NEXT:   KILL [[S_LOAD_DWORDX4_IMM25]]
-  ; CHECK-NEXT:   KILL [[V_MOV_B32_e32_]]
   ; CHECK-NEXT:   KILL [[S_LOAD_DWORDX4_IMM26]]
+  ; CHECK-NEXT:   KILL [[V_MOV_B32_e32_]]
+  ; CHECK-NEXT:   KILL [[S_LOAD_DWORDX4_IMM27]]
   ; CHECK-NEXT:   [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 -2, [[BUFFER_LOAD_FORMAT_X_IDXEN]], 0, implicit $exec
   ; CHECK-NEXT:   [[V_ADD_U32_e64_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 -1, [[BUFFER_LOAD_FORMAT_X_IDXEN1]], 0, implicit $exec
   ; CHECK-NEXT:   [[V_ADD_U32_e64_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 -3, [[BUFFER_LOAD_FORMAT_X_IDXEN]], 0, implicit $exec
@@ -351,13 +349,13 @@ define amdgpu_gs void @_amdgpu_gs_main(i32 inreg %primShaderTableAddrLow, <31 x
   ; CHECK-NEXT:   [[V_OR_B32_e64_64:%[0-9]+]]:vgpr_32 = V_OR_B32_e64 [[V_OR_B32_e64_63]], [[V_ADD_U32_e64_28]], implicit $exec
   ; CHECK-NEXT:   [[V_ADD_U32_e64_30:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 -593, [[BUFFER_LOAD_FORMAT_X_IDXEN]], 0, implicit $exec
   ; CHECK-NEXT:   [[V_OR_B32_e64_65:%[0-9]+]]:vgpr_32 = V_OR_B32_e64 [[V_OR_B32_e64_64]], [[V_ADD_U32_e64_29]], implicit $exec
-  ; CHECK-NEXT:   [[S_LOAD_DWORDX8_IMM:%[0-9]+]]:sgpr_256 = S_LOAD_DWORDX8_IMM undef %543:sreg_64, 0, 0 :: (invariant load (s256) from `ptr addrspace(4) poison`, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX8_IMM:%[0-9]+]]:sgpr_256 = S_LOAD_DWORDX8_IMM undef %516:sreg_64, 0, 0 :: (invariant load (s256) from `ptr addrspace(4) poison`, addrspace 4)
   ; CHECK-NEXT:   [[V_OR_B32_e64_66:%[0-9]+]]:vgpr_32 = V_OR_B32_e64 [[V_OR_B32_e64_65]], [[V_ADD_U32_e64_30]], implicit $exec
   ; CHECK-NEXT:   [[S_ADD_I32_24:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_BUFFER_LOAD_DWORD_IMM8]], -594, implicit-def dead $scc
   ; CHECK-NEXT:   [[V_OR_B32_e64_67:%[0-9]+]]:vgpr_32 = V_OR_B32_e64 [[S_ADD_I32_24]], [[V_OR_B32_e64_66]], implicit $exec
   ; CHECK-NEXT:   [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U32_e64 0, [[V_OR_B32_e64_67]], implicit $exec
   ; CHECK-NEXT:   undef [[V_CNDMASK_B32_e64_:%[0-9]+]].sub3:vreg_128 = V_CNDMASK_B32_e64 0, 0, 0, 1, [[V_CMP_EQ_U32_e64_]], implicit $exec
-  ; CHECK-NEXT:   IMAGE_STORE_V4_V2_nsa_gfx10 [[V_CNDMASK_B32_e64_]], undef %557:vgpr_32, undef %559:vgpr_32, [[S_LOAD_DWORDX8_IMM]], 15, 1, -1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), addrspace 8)
+  ; CHECK-NEXT:   IMAGE_STORE_V4_V2_nsa_gfx10 [[V_CNDMASK_B32_e64_]], undef %530:vgpr_32, undef %532:vgpr_32, [[S_LOAD_DWORDX8_IMM]], 15, 1, -1, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store (s128), addrspace 8)
   ; CHECK-NEXT:   S_ENDPGM 0
 .expVert:
   %0 = extractelement <31 x i32> %userData, i64 2
diff --git a/llvm/test/CodeGen/AMDGPU/wqm.ll b/llvm/test/CodeGen/AMDGPU/wqm.ll
index ad8dcd3888e9f..21f0c008366a9 100644
--- a/llvm/test/CodeGen/AMDGPU/wqm.ll
+++ b/llvm/test/CodeGen/AMDGPU/wqm.ll
@@ -3477,13 +3477,10 @@ define amdgpu_gs void @wqm_init_exec_wwm() {
 ; GFX9-W64-NEXT:    s_mov_b64 exec, 0
 ; GFX9-W64-NEXT:    s_mov_b32 s1, 0
 ; GFX9-W64-NEXT:    s_mov_b32 s0, s1
-; GFX9-W64-NEXT:    s_cmp_lg_u64 exec, 0
-; GFX9-W64-NEXT:    s_cselect_b64 s[2:3], -1, 0
-; GFX9-W64-NEXT:    s_cmp_lg_u64 s[0:1], 0
+; GFX9-W64-NEXT:    s_cmp_eq_u64 s[0:1], 0
 ; GFX9-W64-NEXT:    s_cselect_b64 s[0:1], -1, 0
-; GFX9-W64-NEXT:    s_xor_b64 s[0:1], s[2:3], s[0:1]
-; GFX9-W64-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s[0:1]
-; GFX9-W64-NEXT:    v_mov_b32_e32 v1, 0
+; GFX9-W64-NEXT:    v_mov_b32_e32 v0, 0
+; GFX9-W64-NEXT:    v_cndmask_b32_e64 v1, 0, 1.0, s[0:1]
 ; GFX9-W64-NEXT:    exp mrt0 off, off, off, off
 ; GFX9-W64-NEXT:    s_endpgm
 ;
@@ -3491,14 +3488,11 @@ define amdgpu_gs void @wqm_init_exec_wwm() {
 ; GFX10-W32:       ; %bb.0:
 ; GFX10-W32-NEXT:    s_mov_b32 exec_lo, 0
 ; GFX10-W32-NEXT:    s_mov_b32 s1, 0
-; GFX10-W32-NEXT:    s_cmp_lg_u64 exec, 0
+; GFX10-W32-NEXT:    v_mov_b32_e32 v0, 0
 ; GFX10-W32-NEXT:    s_mov_b32 s0, s1
-; GFX10-W32-NEXT:    s_cselect_b32 s2, -1, 0
-; GFX10-W32-NEXT:    s_cmp_lg_u64 s[0:1], 0
-; GFX10-W32-NEXT:    v_mov_b32_e32 v1, 0
+; GFX10-W32-NEXT:    s_cmp_eq_u64 s[0:1], 0
 ; GFX10-W32-NEXT:    s_cselect_b32 s0, -1, 0
-; GFX10-W32-NEXT:    s_xor_b32 s0, s2, s0
-; GFX10-W32-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s0
+; GFX10-W32-NEXT:    v_cndmask_b32_e64 v1, 0, 1.0, s0
 ; GFX10-W32-NEXT:    exp mrt0 off, off, off, off
 ; GFX10-W32-NEXT:    s_endpgm
   call void @llvm.amdgcn.init.exec(i64 0)
diff --git a/llvm/test/Transforms/GVN/assume-equal.ll b/llvm/test/Transforms/GVN/assume-equal.ll
index bbbc5c58584a6..a38980169fc52 100644
--- a/llvm/test/Transforms/GVN/assume-equal.ll
+++ b/llvm/test/Transforms/GVN/assume-equal.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt < %s -passes=gvn -S | FileCheck %s
 
+target datalayout = "p1:64:64:64:32"
+
 %struct.A = type { ptr }
 @_ZTV1A = available_externally unnamed_addr constant [4 x ptr] [ptr null, ptr @_ZTI1A, ptr @_ZN1A3fooEv, ptr @_ZN1A3barEv], align 8
 @_ZTI1A = external constant ptr
@@ -372,6 +374,20 @@ define i1 @assume_ptr_eq_different_prov_does_not_matter_icmp(ptr %p, ptr %p2) {
   ret i1 %c
 }
 
+define i1 @assume_ptr_eq_different_prov_does_not_matter_icmp_addrsize(ptr addrspace(1) %p, ptr addrspace(1) %p2) {
+; CHECK-LABEL: define i1 @assume_ptr_eq_different_prov_does_not_matter_icmp_addrsize(
+; CHECK-SAME: ptr addrspace(1) [[P:%.*]], ptr addrspace(1) [[P2:%.*]]) {
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr addrspace(1) [[P]], [[P2]]
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP]])
+; CHECK-NEXT:    [[C:%.*]] = icmp eq ptr addrspace(1) [[P]], null
+; CHECK-NEXT:    ret i1 [[C]]
+;
+  %cmp = icmp eq ptr addrspace(1) %p, %p2
+  call void @llvm.assume(i1 %cmp)
+  %c = icmp eq ptr addrspace(1) %p2, null
+  ret i1 %c
+}
+
 ; This is not correct, as it may change the provenance exposed by ptrtoint.
 ; We still allow it for now.
 define i64 @assume_ptr_eq_different_prov_does_not_matter_ptrtoint(ptr %p, ptr %p2) {
@@ -388,6 +404,20 @@ define i64 @assume_ptr_eq_different_prov_does_not_matter_ptrtoint(ptr %p, ptr %p
   ret i64 %int
 }
 
+define i64 @assume_ptr_eq_different_prov_does_not_matter_ptrtoint_addrsize(ptr addrspace(1) %p, ptr addrspace(1) %p2) {
+; CHECK-LABEL: define i64 @assume_ptr_eq_different_prov_does_not_matter_ptrtoint_addrsize(
+; CHECK-SAME: ptr addrspace(1) [[P:%.*]], ptr addrspace(1) [[P2:%.*]]) {
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr addrspace(1) [[P]], [[P2]]
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP]])
+; CHECK-NEXT:    [[INT:%.*]] = ptrtoint ptr addrspace(1) [[P]] to i64
+; CHECK-NEXT:    ret i64 [[INT]]
+;
+  %cmp = icmp eq ptr addrspace(1) %p, %p2
+  call void @llvm.assume(i1 %cmp)
+  %int = ptrtoint ptr addrspace(1) %p2 to i64
+  ret i64 %int
+}
+
 define i64 @assume_ptr_eq_different_prov_does_not_matter_ptrtoaddr(ptr %p, ptr %p2) {
 ; CHECK-LABEL: define i64 @assume_ptr_eq_different_prov_does_not_matter_ptrtoaddr(
 ; CHECK-SAME: ptr [[P:%.*]], ptr [[P2:%.*]]) {
@@ -402,6 +432,20 @@ define i64 @assume_ptr_eq_different_prov_does_not_matter_ptrtoaddr(ptr %p, ptr %
   ret i64 %int
 }
 
+define i32 @assume_ptr_eq_different_prov_does_not_matter_ptrtoaddr_addrsize(ptr addrspace(1) %p, ptr addrspace(1) %p2) {
+; CHECK-LABEL: define i32 @assume_ptr_eq_different_prov_does_not_matter_ptrtoaddr_addrsize(
+; CHECK-SAME: ptr addrspace(1) [[P:%.*]], ptr addrspace(1) [[P2:%.*]]) {
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr addrspace(1) [[P]], [[P2]]
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP]])
+; CHECK-NEXT:    [[INT:%.*]] = ptrtoaddr ptr addrspace(1) [[P]] to i32
+; CHECK-NEXT:    ret i32 [[INT]]
+;
+  %cmp = icmp eq ptr addrspace(1) %p, %p2
+  call void @llvm.assume(i1 %cmp)
+  %int = ptrtoaddr ptr addrspace(1) %p2 to i32
+  ret i32 %int
+}
+
 define i8 @assume_ptr_eq_same_prov(ptr %p, i64 %x) {
 ; CHECK-LABEL: define i8 @assume_ptr_eq_same_prov(
 ; CHECK-SAME: ptr [[P:%.*]], i64 [[X:%.*]]) {
diff --git a/llvm/test/Transforms/MemCpyOpt/stack-move.ll b/llvm/test/Transforms/MemCpyOpt/stack-move.ll
index 940e30ec46881..0c2e05fa8fed6 100644
--- a/llvm/test/Transforms/MemCpyOpt/stack-move.ll
+++ b/llvm/test/Transforms/MemCpyOpt/stack-move.ll
@@ -1729,3 +1729,61 @@ define i32 @test_ret_only_capture() {
   %v = load i32, ptr %a
   ret i32 %v
 }
+
+declare ptr @captures_address_only(ptr captures(address))
+
+; Can transform: Only one address captured.
+define void @test_captures_address_captures_none() {
+; CHECK-LABEL: define void @test_captures_address_captures_none() {
+; CHECK-NEXT:    [[SRC:%.*]] = alloca [[STRUCT_FOO:%.*]], align 4
+; CHECK-NEXT:    store [[STRUCT_FOO]] { i32 10, i32 20, i32 30 }, ptr [[SRC]], align 4
+; CHECK-NEXT:    call void @captures_address_only(ptr [[SRC]])
+; CHECK-NEXT:    call void @use_nocapture(ptr [[SRC]])
+; CHECK-NEXT:    ret void
+;
+  %src = alloca %struct.Foo, align 4
+  %dst = alloca %struct.Foo, align 4
+  store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
+  call void @captures_address_only(ptr %src)
+  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dst, ptr align 4 %src, i64 12, i1 false)
+  call void @use_nocapture(ptr %dst)
+  ret void
+}
+
+; Can transform: Only one address captured.
+define void @test_captures_none_and_captures_address() {
+; CHECK-LABEL: define void @test_captures_none_and_captures_address() {
+; CHECK-NEXT:    [[SRC:%.*]] = alloca [[STRUCT_FOO:%.*]], align 4
+; CHECK-NEXT:    store [[STRUCT_FOO]] { i32 10, i32 20, i32 30 }, ptr [[SRC]], align 4
+; CHECK-NEXT:    call void @use_nocapture(ptr [[SRC]])
+; CHECK-NEXT:    call void @captures_address_only(ptr [[SRC]])
+; CHECK-NEXT:    ret void
+;
+  %src = alloca %struct.Foo, align 4
+  %dst = alloca %struct.Foo, align 4
+  store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
+  call void @use_nocapture(ptr %src)
+  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dst, ptr align 4 %src, i64 12, i1 false)
+  call void @captures_address_only(ptr %dst)
+  ret void
+}
+
+; Cannot transform: Both addresses captured.
+define void @test_captures_address_and_captures_address() {
+; CHECK-LABEL: define void @test_captures_address_and_captures_address() {
+; CHECK-NEXT:    [[SRC:%.*]] = alloca [[STRUCT_FOO:%.*]], align 4
+; CHECK-NEXT:    [[DST:%.*]] = alloca [[STRUCT_FOO]], align 4
+; CHECK-NEXT:    store [[STRUCT_FOO]] { i32 10, i32 20, i32 30 }, ptr [[SRC]], align 4
+; CHECK-NEXT:    call void @captures_address_only(ptr [[SRC]])
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DST]], ptr align 4 [[SRC]], i64 12, i1 false)
+; CHECK-NEXT:    call void @captures_address_only(ptr [[DST]])
+; CHECK-NEXT:    ret void
+;
+  %src = alloca %struct.Foo, align 4
+  %dst = alloca %struct.Foo, align 4
+  store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src
+  call void @captures_address_only(ptr %src)
+  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dst, ptr align 4 %src, i64 12, i1 false)
+  call void @captures_address_only(ptr %dst)
+  ret void
+}
diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index 8cd200c93a482..b6b80ea117672 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -2396,244 +2396,6 @@ def add_analyze_checks(
     )
 
 
-IR_FUNC_NAME_RE = re.compile(
-    r"^\s*define\s+(?:internal\s+)?[^@]*@(?P<func>[A-Za-z0-9_.]+)\s*\("
-)
-IR_PREFIX_DATA_RE = re.compile(r"^ *(;|$)")
-MIR_FUNC_NAME_RE = re.compile(r" *name: *(?P<func>[A-Za-z0-9_.-]+)")
-MIR_BODY_BEGIN_RE = re.compile(r" *body: *\|")
-MIR_BASIC_BLOCK_RE = re.compile(r" *bb\.[0-9]+.*:$")
-MIR_PREFIX_DATA_RE = re.compile(r"^ *(;|bb.[0-9].*: *$|[a-z]+:( |$)|$)")
-
-
-def find_mir_functions_with_one_bb(lines, verbose=False):
-    result = []
-    cur_func = None
-    bbs = 0
-    for line in lines:
-        m = MIR_FUNC_NAME_RE.match(line)
-        if m:
-            if bbs == 1:
-                result.append(cur_func)
-            cur_func = m.group("func")
-            bbs = 0
-        m = MIR_BASIC_BLOCK_RE.match(line)
-        if m:
-            bbs += 1
-    if bbs == 1:
-        result.append(cur_func)
-    return result
-
-
-def add_mir_checks_for_function(
-    test,
-    output_lines,
-    run_list,
-    func_dict,
-    func_name,
-    single_bb,
-    print_fixed_stack,
-    first_check_is_next,
-    at_the_function_name,
-):
-    printed_prefixes = set()
-    for run in run_list:
-        for prefix in run[0]:
-            if prefix in printed_prefixes:
-                break
-            if not func_dict[prefix][func_name]:
-                continue
-            if printed_prefixes:
-                # Add some space between different check prefixes.
-                indent = len(output_lines[-1]) - len(output_lines[-1].lstrip(" "))
-                output_lines.append(" " * indent + ";")
-            printed_prefixes.add(prefix)
-            add_mir_check_lines(
-                test,
-                output_lines,
-                prefix,
-                ("@" if at_the_function_name else "") + func_name,
-                single_bb,
-                func_dict[prefix][func_name],
-                print_fixed_stack,
-                first_check_is_next,
-            )
-            break
-        else:
-            warn(
-                "Found conflicting asm for function: {}".format(func_name),
-                test_file=test,
-            )
-    return output_lines
-
-
-def add_mir_check_lines(
-    test,
-    output_lines,
-    prefix,
-    func_name,
-    single_bb,
-    func_info,
-    print_fixed_stack,
-    first_check_is_next,
-):
-    func_body = str(func_info).splitlines()
-    if single_bb:
-        # Don't bother checking the basic block label for a single BB
-        func_body.pop(0)
-
-    if not func_body:
-        warn(
-            "Function has no instructions to check: {}".format(func_name),
-            test_file=test,
-        )
-        return
-
-    first_line = func_body[0]
-    indent = len(first_line) - len(first_line.lstrip(" "))
-    # A check comment, indented the appropriate amount
-    check = "{:>{}}; {}".format("", indent, prefix)
-
-    output_lines.append("{}-LABEL: name: {}".format(check, func_name))
-
-    if print_fixed_stack:
-        output_lines.append("{}: fixedStack:".format(check))
-        for stack_line in func_info.extrascrub.splitlines():
-            filecheck_directive = check + "-NEXT"
-            output_lines.append("{}: {}".format(filecheck_directive, stack_line))
-
-    first_check = not first_check_is_next
-    for func_line in func_body:
-        if not func_line.strip():
-            # The mir printer prints leading whitespace so we can't use CHECK-EMPTY:
-            output_lines.append(check + "-NEXT: {{" + func_line + "$}}")
-            continue
-        filecheck_directive = check if first_check else check + "-NEXT"
-        first_check = False
-        check_line = "{}: {}".format(filecheck_directive, func_line[indent:]).rstrip()
-        output_lines.append(check_line)
-
-
-def should_add_mir_line_to_output(input_line, prefix_set):
-    # Skip any check lines that we're handling as well as comments
-    m = CHECK_RE.match(input_line)
-    if (m and m.group(1) in prefix_set) or input_line.strip() == ";":
-        return False
-    return True
-
-
-def add_mir_checks(
-    input_lines,
-    prefix_set,
-    autogenerated_note,
-    test,
-    run_list,
-    func_dict,
-    print_fixed_stack,
-    first_check_is_next,
-    at_the_function_name,
-):
-    simple_functions = find_mir_functions_with_one_bb(input_lines)
-
-    output_lines = []
-    output_lines.append(autogenerated_note)
-
-    func_name = None
-    state = "toplevel"
-    for input_line in input_lines:
-        if input_line == autogenerated_note:
-            continue
-
-        if state == "toplevel":
-            m = IR_FUNC_NAME_RE.match(input_line)
-            if m:
-                state = "ir function prefix"
-                func_name = m.group("func")
-            if input_line.rstrip("| \r\n") == "---":
-                state = "document"
-            output_lines.append(input_line)
-        elif state == "document":
-            m = MIR_FUNC_NAME_RE.match(input_line)
-            if m:
-                state = "mir function metadata"
-                func_name = m.group("func")
-            if input_line.strip() == "...":
-                state = "toplevel"
-                func_name = None
-            if should_add_mir_line_to_output(input_line, prefix_set):
-                output_lines.append(input_line)
-        elif state == "mir function metadata":
-            if should_add_mir_line_to_output(input_line, prefix_set):
-                output_lines.append(input_line)
-            m = MIR_BODY_BEGIN_RE.match(input_line)
-            if m:
-                if func_name in simple_functions:
-                    # If there's only one block, put the checks inside it
-                    state = "mir function prefix"
-                    continue
-                state = "mir function body"
-                add_mir_checks_for_function(
-                    test,
-                    output_lines,
-                    run_list,
-                    func_dict,
-                    func_name,
-                    single_bb=False,
-                    print_fixed_stack=print_fixed_stack,
-                    first_check_is_next=first_check_is_next,
-                    at_the_function_name=at_the_function_name,
-                )
-        elif state == "mir function prefix":
-            m = MIR_PREFIX_DATA_RE.match(input_line)
-            if not m:
-                state = "mir function body"
-                add_mir_checks_for_function(
-                    test,
-                    output_lines,
-                    run_list,
-                    func_dict,
-                    func_name,
-                    single_bb=True,
-                    print_fixed_stack=print_fixed_stack,
-                    first_check_is_next=first_check_is_next,
-                    at_the_function_name=at_the_function_name,
-                )
-
-            if should_add_mir_line_to_output(input_line, prefix_set):
-                output_lines.append(input_line)
-        elif state == "mir function body":
-            if input_line.strip() == "...":
-                state = "toplevel"
-                func_name = None
-            if should_add_mir_line_to_output(input_line, prefix_set):
-                output_lines.append(input_line)
-        elif state == "ir function prefix":
-            m = IR_PREFIX_DATA_RE.match(input_line)
-            if not m:
-                state = "ir function body"
-                add_mir_checks_for_function(
-                    test,
-                    output_lines,
-                    run_list,
-                    func_dict,
-                    func_name,
-                    single_bb=False,
-                    print_fixed_stack=print_fixed_stack,
-                    first_check_is_next=first_check_is_next,
-                    at_the_function_name=at_the_function_name,
-                )
-
-            if should_add_mir_line_to_output(input_line, prefix_set):
-                output_lines.append(input_line)
-        elif state == "ir function body":
-            if input_line.strip() == "}":
-                state = "toplevel"
-                func_name = None
-            if should_add_mir_line_to_output(input_line, prefix_set):
-                output_lines.append(input_line)
-    return output_lines
-
-
 def build_global_values_dictionary(glob_val_dict, raw_tool_output, prefixes, ginfo):
     for nameless_value in ginfo.get_nameless_values():
         if nameless_value.global_ir_rhs_regexp is None:
diff --git a/llvm/utils/UpdateTestChecks/mir.py b/llvm/utils/UpdateTestChecks/mir.py
new file mode 100644
index 0000000000000..24bb8b341d335
--- /dev/null
+++ b/llvm/utils/UpdateTestChecks/mir.py
@@ -0,0 +1,362 @@
+"""MIR test utility functions for UpdateTestChecks scripts."""
+
+import re
+import sys
+from UpdateTestChecks import common
+from UpdateTestChecks.common import (
+    CHECK_RE,
+    warn,
+)
+
+IR_FUNC_NAME_RE = re.compile(
+    r"^\s*define\s+(?:internal\s+)?[^@]*@(?P<func>[A-Za-z0-9_.]+)\s*\("
+)
+IR_PREFIX_DATA_RE = re.compile(r"^ *(;|$)")
+MIR_FUNC_NAME_RE = re.compile(r" *name: *(?P<func>[A-Za-z0-9_.-]+)")
+MIR_BODY_BEGIN_RE = re.compile(r" *body: *\|")
+MIR_BASIC_BLOCK_RE = re.compile(r" *bb\.[0-9]+.*:$")
+MIR_PREFIX_DATA_RE = re.compile(r"^ *(;|bb.[0-9].*: *$|[a-z]+:( |$)|$)")
+
+VREG_RE = re.compile(r"(%[0-9]+)(?:\.[a-z0-9_]+)?(?::[a-z0-9_]+)?(?:\([<>a-z0-9 ]+\))?")
+MI_FLAGS_STR = (
+    r"(frame-setup |frame-destroy |nnan |ninf |nsz |arcp |contract |afn "
+    r"|reassoc |nuw |nsw |exact |nofpexcept |nomerge |unpredictable "
+    r"|noconvergent |nneg |disjoint |nusw |samesign |inbounds )*"
+)
+VREG_DEF_FLAGS_STR = r"(?:dead |undef )*"
+
+# Pattern to match the defined vregs and the opcode of an instruction that
+# defines vregs. Opcodes starting with a lower-case 't' are allowed to match
+# ARM's thumb instructions, like tADDi8 and t2ADDri.
+VREG_DEF_RE = re.compile(
+    r"^ *(?P<vregs>{2}{0}(?:, {2}{0})*) = "
+    r"{1}(?P<opcode>[A-Zt][A-Za-z0-9_]+)".format(
+        VREG_RE.pattern, MI_FLAGS_STR, VREG_DEF_FLAGS_STR
+    )
+)
+
+MIR_FUNC_RE = re.compile(
+    r"^---$"
+    r"\n"
+    r"^ *name: *(?P<func>[A-Za-z0-9_.-]+)$"
+    r".*?"
+    r"(?:^ *fixedStack: *(\[\])? *\n"
+    r"(?P<fixedStack>.*?)\n?"
+    r"^ *stack:"
+    r".*?)?"
+    r"^ *body: *\|\n"
+    r"(?P<body>.*?)\n"
+    r"^\.\.\.$",
+    flags=(re.M | re.S),
+)
+
+
+def build_function_info_dictionary(
+    test, raw_tool_output, triple, prefixes, func_dict, verbose
+):
+    for m in MIR_FUNC_RE.finditer(raw_tool_output):
+        func = m.group("func")
+        fixedStack = m.group("fixedStack")
+        body = m.group("body")
+        if verbose:
+            print("Processing function: {}".format(func), file=sys.stderr)
+            for l in body.splitlines():
+                print("  {}".format(l), file=sys.stderr)
+
+        # Vreg mangling
+        mangled = []
+        vreg_map = {}
+        for func_line in body.splitlines(keepends=True):
+            m = VREG_DEF_RE.match(func_line)
+            if m:
+                for vreg in VREG_RE.finditer(m.group("vregs")):
+                    if vreg.group(1) in vreg_map:
+                        name = vreg_map[vreg.group(1)]
+                    else:
+                        name = mangle_vreg(m.group("opcode"), vreg_map.values())
+                        vreg_map[vreg.group(1)] = name
+                    func_line = func_line.replace(
+                        vreg.group(1), "[[{}:%[0-9]+]]".format(name), 1
+                    )
+            for number, name in vreg_map.items():
+                func_line = re.sub(
+                    r"{}\b".format(number), "[[{}]]".format(name), func_line
+                )
+            mangled.append(func_line)
+        body = "".join(mangled)
+
+        for prefix in prefixes:
+            info = common.function_body(
+                body, fixedStack, None, None, None, None, ginfo=None
+            )
+            if func in func_dict[prefix]:
+                if (
+                    not func_dict[prefix][func]
+                    or func_dict[prefix][func].scrub != info.scrub
+                    or func_dict[prefix][func].extrascrub != info.extrascrub
+                ):
+                    func_dict[prefix][func] = None
+            else:
+                func_dict[prefix][func] = info
+
+
+def mangle_vreg(opcode, current_names):
+    base = opcode
+    # Simplify some common prefixes and suffixes
+    if opcode.startswith("G_"):
+        base = base[len("G_") :]
+    if opcode.endswith("_PSEUDO"):
+        base = base[: len("_PSEUDO")]
+    # Shorten some common opcodes with long-ish names
+    base = dict(
+        IMPLICIT_DEF="DEF",
+        GLOBAL_VALUE="GV",
+        CONSTANT="C",
+        FCONSTANT="C",
+        MERGE_VALUES="MV",
+        UNMERGE_VALUES="UV",
+        INTRINSIC="INT",
+        INTRINSIC_W_SIDE_EFFECTS="INT",
+        INSERT_VECTOR_ELT="IVEC",
+        EXTRACT_VECTOR_ELT="EVEC",
+        SHUFFLE_VECTOR="SHUF",
+    ).get(base, base)
+    # Avoid ambiguity when opcodes end in numbers
+    if len(base.rstrip("0123456789")) < len(base):
+        base += "_"
+
+    i = 0
+    for name in current_names:
+        if name.rstrip("0123456789") == base:
+            i += 1
+    if i:
+        return "{}{}".format(base, i)
+    return base
+
+
+def find_mir_functions_with_one_bb(lines, verbose=False):
+    result = []
+    cur_func = None
+    bbs = 0
+    for line in lines:
+        m = MIR_FUNC_NAME_RE.match(line)
+        if m:
+            if bbs == 1:
+                result.append(cur_func)
+            cur_func = m.group("func")
+            bbs = 0
+        m = MIR_BASIC_BLOCK_RE.match(line)
+        if m:
+            bbs += 1
+    if bbs == 1:
+        result.append(cur_func)
+    return result
+
+
+def add_mir_checks_for_function(
+    test,
+    output_lines,
+    run_list,
+    func_dict,
+    func_name,
+    single_bb,
+    print_fixed_stack,
+    first_check_is_next,
+    at_the_function_name,
+):
+    printed_prefixes = set()
+    for run in run_list:
+        for prefix in run[0]:
+            if prefix in printed_prefixes:
+                break
+            if not func_dict[prefix][func_name]:
+                continue
+            if printed_prefixes:
+                # Add some space between different check prefixes.
+                indent = len(output_lines[-1]) - len(output_lines[-1].lstrip(" "))
+                output_lines.append(" " * indent + ";")
+            printed_prefixes.add(prefix)
+            add_mir_check_lines(
+                test,
+                output_lines,
+                prefix,
+                ("@" if at_the_function_name else "") + func_name,
+                single_bb,
+                func_dict[prefix][func_name],
+                print_fixed_stack,
+                first_check_is_next,
+            )
+            break
+        else:
+            warn(
+                "Found conflicting asm for function: {}".format(func_name),
+                test_file=test,
+            )
+    return output_lines
+
+
+def add_mir_check_lines(
+    test,
+    output_lines,
+    prefix,
+    func_name,
+    single_bb,
+    func_info,
+    print_fixed_stack,
+    first_check_is_next,
+):
+    func_body = str(func_info).splitlines()
+    if single_bb:
+        # Don't bother checking the basic block label for a single BB
+        func_body.pop(0)
+
+    if not func_body:
+        warn(
+            "Function has no instructions to check: {}".format(func_name),
+            test_file=test,
+        )
+        return
+
+    first_line = func_body[0]
+    indent = len(first_line) - len(first_line.lstrip(" "))
+    # A check comment, indented the appropriate amount
+    check = "{:>{}}; {}".format("", indent, prefix)
+
+    output_lines.append("{}-LABEL: name: {}".format(check, func_name))
+
+    if print_fixed_stack:
+        output_lines.append("{}: fixedStack:".format(check))
+        for stack_line in func_info.extrascrub.splitlines():
+            filecheck_directive = check + "-NEXT"
+            output_lines.append("{}: {}".format(filecheck_directive, stack_line))
+
+    first_check = not first_check_is_next
+    for func_line in func_body:
+        if not func_line.strip():
+            # The mir printer prints leading whitespace so we can't use CHECK-EMPTY:
+            output_lines.append(check + "-NEXT: {{" + func_line + "$}}")
+            continue
+        filecheck_directive = check if first_check else check + "-NEXT"
+        first_check = False
+        check_line = "{}: {}".format(filecheck_directive, func_line[indent:]).rstrip()
+        output_lines.append(check_line)
+
+
+def should_add_mir_line_to_output(input_line, prefix_set):
+    # Skip any check lines that we're handling as well as comments
+    m = CHECK_RE.match(input_line)
+    if (m and m.group(1) in prefix_set) or input_line.strip() == ";":
+        return False
+    return True
+
+
+def add_mir_checks(
+    input_lines,
+    prefix_set,
+    autogenerated_note,
+    test,
+    run_list,
+    func_dict,
+    print_fixed_stack,
+    first_check_is_next,
+    at_the_function_name,
+):
+    simple_functions = find_mir_functions_with_one_bb(input_lines)
+
+    output_lines = []
+    output_lines.append(autogenerated_note)
+
+    func_name = None
+    state = "toplevel"
+    for input_line in input_lines:
+        if input_line == autogenerated_note:
+            continue
+
+        if state == "toplevel":
+            m = IR_FUNC_NAME_RE.match(input_line)
+            if m:
+                state = "ir function prefix"
+                func_name = m.group("func")
+            if input_line.rstrip("| \r\n") == "---":
+                state = "document"
+            output_lines.append(input_line)
+        elif state == "document":
+            m = MIR_FUNC_NAME_RE.match(input_line)
+            if m:
+                state = "mir function metadata"
+                func_name = m.group("func")
+            if input_line.strip() == "...":
+                state = "toplevel"
+                func_name = None
+            if should_add_mir_line_to_output(input_line, prefix_set):
+                output_lines.append(input_line)
+        elif state == "mir function metadata":
+            if should_add_mir_line_to_output(input_line, prefix_set):
+                output_lines.append(input_line)
+            m = MIR_BODY_BEGIN_RE.match(input_line)
+            if m:
+                if func_name in simple_functions:
+                    # If there's only one block, put the checks inside it
+                    state = "mir function prefix"
+                    continue
+                state = "mir function body"
+                add_mir_checks_for_function(
+                    test,
+                    output_lines,
+                    run_list,
+                    func_dict,
+                    func_name,
+                    single_bb=False,
+                    print_fixed_stack=print_fixed_stack,
+                    first_check_is_next=first_check_is_next,
+                    at_the_function_name=at_the_function_name,
+                )
+        elif state == "mir function prefix":
+            m = MIR_PREFIX_DATA_RE.match(input_line)
+            if not m:
+                state = "mir function body"
+                add_mir_checks_for_function(
+                    test,
+                    output_lines,
+                    run_list,
+                    func_dict,
+                    func_name,
+                    single_bb=True,
+                    print_fixed_stack=print_fixed_stack,
+                    first_check_is_next=first_check_is_next,
+                    at_the_function_name=at_the_function_name,
+                )
+
+            if should_add_mir_line_to_output(input_line, prefix_set):
+                output_lines.append(input_line)
+        elif state == "mir function body":
+            if input_line.strip() == "...":
+                state = "toplevel"
+                func_name = None
+            if should_add_mir_line_to_output(input_line, prefix_set):
+                output_lines.append(input_line)
+        elif state == "ir function prefix":
+            m = IR_PREFIX_DATA_RE.match(input_line)
+            if not m:
+                state = "ir function body"
+                add_mir_checks_for_function(
+                    test,
+                    output_lines,
+                    run_list,
+                    func_dict,
+                    func_name,
+                    single_bb=False,
+                    print_fixed_stack=print_fixed_stack,
+                    first_check_is_next=first_check_is_next,
+                    at_the_function_name=at_the_function_name,
+                )
+
+            if should_add_mir_line_to_output(input_line, prefix_set):
+                output_lines.append(input_line)
+        elif state == "ir function body":
+            if input_line.strip() == "}":
+                state = "toplevel"
+                func_name = None
+            if should_add_mir_line_to_output(input_line, prefix_set):
+                output_lines.append(input_line)
+    return output_lines
diff --git a/llvm/utils/update_givaluetracking_test_checks.py b/llvm/utils/update_givaluetracking_test_checks.py
index 49b068ac7bef0..9ad0f3ec9ad1c 100755
--- a/llvm/utils/update_givaluetracking_test_checks.py
+++ b/llvm/utils/update_givaluetracking_test_checks.py
@@ -19,6 +19,7 @@
 import sys
 
 from UpdateTestChecks import common
+from UpdateTestChecks import mir
 
 VT_FUNCTION_RE = re.compile(
     r"\s*name:\s*@(?P<func>[A-Za-z0-9_-]+)"
@@ -92,7 +93,7 @@ def update_test(ti: common.TestInfo):
     func_dict = builder.finish_and_get_func_dict()
     prefix_set = set([prefix for p in run_list for prefix in p[0]])
     common.debug("Rewriting FileCheck prefixes:", str(prefix_set))
-    output_lines = common.add_mir_checks(
+    output_lines = mir.add_mir_checks(
         ti.input_lines,
         prefix_set,
         ti.test_autogenerated_note,
diff --git a/llvm/utils/update_mir_test_checks.py b/llvm/utils/update_mir_test_checks.py
index c4ee0523a6469..ba70249db28e6 100755
--- a/llvm/utils/update_mir_test_checks.py
+++ b/llvm/utils/update_mir_test_checks.py
@@ -31,39 +31,7 @@
 import sys
 
 from UpdateTestChecks import common
-
-VREG_RE = re.compile(r"(%[0-9]+)(?:\.[a-z0-9_]+)?(?::[a-z0-9_]+)?(?:\([<>a-z0-9 ]+\))?")
-MI_FLAGS_STR = (
-    r"(frame-setup |frame-destroy |nnan |ninf |nsz |arcp |contract |afn "
-    r"|reassoc |nuw |nsw |exact |nofpexcept |nomerge |unpredictable "
-    r"|noconvergent |nneg |disjoint |nusw |samesign |inbounds )*"
-)
-VREG_DEF_FLAGS_STR = r"(?:dead |undef )*"
-
-# Pattern to match the defined vregs and the opcode of an instruction that
-# defines vregs. Opcodes starting with a lower-case 't' are allowed to match
-# ARM's thumb instructions, like tADDi8 and t2ADDri.
-VREG_DEF_RE = re.compile(
-    r"^ *(?P<vregs>{2}{0}(?:, {2}{0})*) = "
-    r"{1}(?P<opcode>[A-Zt][A-Za-z0-9_]+)".format(
-        VREG_RE.pattern, MI_FLAGS_STR, VREG_DEF_FLAGS_STR
-    )
-)
-
-MIR_FUNC_RE = re.compile(
-    r"^---$"
-    r"\n"
-    r"^ *name: *(?P<func>[A-Za-z0-9_.-]+)$"
-    r".*?"
-    r"(?:^ *fixedStack: *(\[\])? *\n"
-    r"(?P<fixedStack>.*?)\n?"
-    r"^ *stack:"
-    r".*?)?"
-    r"^ *body: *\|\n"
-    r"(?P<body>.*?)\n"
-    r"^\.\.\.$",
-    flags=(re.M | re.S),
-)
+from UpdateTestChecks import mir
 
 
 class LLC:
@@ -143,89 +111,6 @@ def build_run_list(test, run_lines, verbose=False):
     return run_list
 
 
-def build_function_info_dictionary(
-    test, raw_tool_output, triple, prefixes, func_dict, verbose
-):
-    for m in MIR_FUNC_RE.finditer(raw_tool_output):
-        func = m.group("func")
-        fixedStack = m.group("fixedStack")
-        body = m.group("body")
-        if verbose:
-            log("Processing function: {}".format(func))
-            for l in body.splitlines():
-                log("  {}".format(l))
-
-        # Vreg mangling
-        mangled = []
-        vreg_map = {}
-        for func_line in body.splitlines(keepends=True):
-            m = VREG_DEF_RE.match(func_line)
-            if m:
-                for vreg in VREG_RE.finditer(m.group("vregs")):
-                    if vreg.group(1) in vreg_map:
-                        name = vreg_map[vreg.group(1)]
-                    else:
-                        name = mangle_vreg(m.group("opcode"), vreg_map.values())
-                        vreg_map[vreg.group(1)] = name
-                    func_line = func_line.replace(
-                        vreg.group(1), "[[{}:%[0-9]+]]".format(name), 1
-                    )
-            for number, name in vreg_map.items():
-                func_line = re.sub(
-                    r"{}\b".format(number), "[[{}]]".format(name), func_line
-                )
-            mangled.append(func_line)
-        body = "".join(mangled)
-
-        for prefix in prefixes:
-            info = common.function_body(
-                body, fixedStack, None, None, None, None, ginfo=None
-            )
-            if func in func_dict[prefix]:
-                if (
-                    not func_dict[prefix][func]
-                    or func_dict[prefix][func].scrub != info.scrub
-                    or func_dict[prefix][func].extrascrub != info.extrascrub
-                ):
-                    func_dict[prefix][func] = None
-            else:
-                func_dict[prefix][func] = info
-
-
-def mangle_vreg(opcode, current_names):
-    base = opcode
-    # Simplify some common prefixes and suffixes
-    if opcode.startswith("G_"):
-        base = base[len("G_") :]
-    if opcode.endswith("_PSEUDO"):
-        base = base[: len("_PSEUDO")]
-    # Shorten some common opcodes with long-ish names
-    base = dict(
-        IMPLICIT_DEF="DEF",
-        GLOBAL_VALUE="GV",
-        CONSTANT="C",
-        FCONSTANT="C",
-        MERGE_VALUES="MV",
-        UNMERGE_VALUES="UV",
-        INTRINSIC="INT",
-        INTRINSIC_W_SIDE_EFFECTS="INT",
-        INSERT_VECTOR_ELT="IVEC",
-        EXTRACT_VECTOR_ELT="EVEC",
-        SHUFFLE_VECTOR="SHUF",
-    ).get(base, base)
-    # Avoid ambiguity when opcodes end in numbers
-    if len(base.rstrip("0123456789")) < len(base):
-        base += "_"
-
-    i = 0
-    for name in current_names:
-        if name.rstrip("0123456789") == base:
-            i += 1
-    if i:
-        return "{}{}".format(base, i)
-    return base
-
-
 def update_test_file(args, test, autogenerated_note):
     with open(test) as fd:
         input_lines = [l.rstrip() for l in fd]
@@ -247,7 +132,7 @@ def update_test_file(args, test, autogenerated_note):
             common.warn("No triple found: skipping file", test_file=test)
             return
 
-        build_function_info_dictionary(
+        mir.build_function_info_dictionary(
             test,
             raw_tool_output,
             triple_in_cmd or triple_in_ir,
@@ -259,7 +144,7 @@ def update_test_file(args, test, autogenerated_note):
     prefix_set = set([prefix for run in run_list for prefix in run[0]])
     log("Rewriting FileCheck prefixes: {}".format(prefix_set), args.verbose)
 
-    output_lines = common.add_mir_checks(
+    output_lines = mir.add_mir_checks(
         input_lines,
         prefix_set,
         autogenerated_note,
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 9d62491214018..0f317eac8fa41 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -3911,21 +3911,21 @@ struct Conv1DGenerator
     Value lhs = vector::TransferReadOp::create(
         rewriter, loc, lhsType, lhsShaped, ValueRange{zero, zero, zero},
         /*padding=*/arith::getZeroConstant(rewriter, loc, lhsEltType));
-    auto maybeMaskedLhs = maybeMaskXferOp(
+    auto *maybeMaskedLhs = maybeMaskXferOp(
         lhsType.getShape(), lhsType.getScalableDims(), lhs.getDefiningOp());
 
     // Read rhs slice of size {kw, c} @ [0, 0].
     Value rhs = vector::TransferReadOp::create(
         rewriter, loc, rhsType, rhsShaped, ValueRange{zero, zero},
         /*padding=*/arith::getZeroConstant(rewriter, loc, rhsEltType));
-    auto maybeMaskedRhs = maybeMaskXferOp(
+    auto *maybeMaskedRhs = maybeMaskXferOp(
         rhsType.getShape(), rhsType.getScalableDims(), rhs.getDefiningOp());
 
     // Read res slice of size {n, w, c} @ [0, 0, 0].
     Value res = vector::TransferReadOp::create(
         rewriter, loc, resType, resShaped, ValueRange{zero, zero, zero},
         /*padding=*/arith::getZeroConstant(rewriter, loc, resEltType));
-    auto maybeMaskedRes = maybeMaskXferOp(
+    auto *maybeMaskedRes = maybeMaskXferOp(
         resType.getShape(), resType.getScalableDims(), res.getDefiningOp());
 
     //===------------------------------------------------------------------===//
diff --git a/mlir/lib/Query/Query.cpp b/mlir/lib/Query/Query.cpp
index 375e82050a481..cf8a4d293299c 100644
--- a/mlir/lib/Query/Query.cpp
+++ b/mlir/lib/Query/Query.cpp
@@ -121,12 +121,13 @@ LogicalResult MatchQuery::run(llvm::raw_ostream &os, QuerySession &qs) const {
   Operation *rootOp = qs.getRootOp();
   int matchCount = 0;
   matcher::MatchFinder finder;
+
+  StringRef functionName = matcher.getFunctionName();
   auto matches = finder.collectMatches(rootOp, std::move(matcher));
 
   // An extract call is recognized by considering if the matcher has a name.
   // TODO: Consider making the extract more explicit.
-  if (matcher.hasFunctionName()) {
-    auto functionName = matcher.getFunctionName();
+  if (!functionName.empty()) {
     std::vector<Operation *> flattenedMatches =
         finder.flattenMatchedOps(matches);
     Operation *function =
diff --git a/mlir/lib/Support/Timing.cpp b/mlir/lib/Support/Timing.cpp
index fb6f82c283df5..16306d72815f7 100644
--- a/mlir/lib/Support/Timing.cpp
+++ b/mlir/lib/Support/Timing.cpp
@@ -319,7 +319,6 @@ class TimerImpl {
   void mergeChildren(AsyncChildrenMap &&other) {
     for (auto &thread : other) {
       mergeChildren(std::move(thread.second));
-      assert(thread.second.empty());
     }
     other.clear();
   }