`manual_div_ceil` suggestion can result in worse generated code

### Description

Particularly in situations where:
- We know that the addition cannot overflow (e.g. when we are using intermediate types that are larger than the inputs)
- We are processing in a loop that can be optimized with AVX2

The manual implementation of `div_ceil`, i.e. `(a + b + 1) / 2`, allows the compiler to use the `vpavgb` instruction. Meanwhile, `div_ceil`'s implementation prevents the compiler from generating such optimized assembly and results in multiple instructions--in the case I'm using as an example, a sequence of `vmovdqu, vpxor, vpand, vpaddb, vpsrlw, vpand, vpaddb, vpsubb`. Not surprisingly, the sequence of 8 instructions is slower than the one single instruction. In the following real-world example function, replacing the usages of `(a + b + 1) / 2` with `(a + b).div_ceil(2)` results in about a 5% function-wide performance degradation in benchmarks.

```rust
#[target_feature(enable = "avx2")]
unsafe fn refine_horizontal_bicubic_u8(
    src: *const u8,
    dest: *mut u8,
    pitch: NonZeroUsize,
    width: NonZeroUsize,
    height: NonZeroUsize,
    bits_per_sample: NonZeroU8,
) {
    let pixel_max = (1u16 << bits_per_sample.get()) - 1;
    let width_val = width.get();
    let pitch_val = pitch.get();

    for j in 0..height.get() {
        let row_offset = j * pitch_val;
        let src_row = unsafe { src.add(row_offset) };
        let dest_row = unsafe { dest.add(row_offset) };

        // First pixel: linear interpolation
        let a = unsafe { *src_row.add(0) } as u16;
        let b = unsafe { *src_row.add(1) } as u16;
        unsafe { *dest_row.add(0) = ((a + b + 1) / 2) as u8 };

        // Handle middle pixels individually - SIMD implementation would be more complex
        // and needs careful boundary checking
        for i in 1..(width_val - 3) {
            let a = unsafe { *src_row.add(i - 1) } as i16;
            let b = unsafe { *src_row.add(i) } as i16;
            let c = unsafe { *src_row.add(i + 1) } as i16;
            let d = unsafe { *src_row.add(i + 2) } as i16;
            let result = (-(a + d) + (b + c) * 9 + 8) >> 4;
            unsafe {
                *dest_row.add(i) = std::cmp::min(pixel_max, std::cmp::max(0, result) as u16) as u8;
            }
        }

        // Second-to-last pixels: linear interpolation
        for i in (width_val - 3)..(width_val - 1) {
            let a = unsafe { *src_row.add(i) } as u16;
            let b = unsafe { *src_row.add(i + 1) } as u16;
            unsafe { *dest_row.add(i) = ((a + b + 1) / 2) as u8 };
        }

        // Last pixel: copy
        unsafe { *dest_row.add(width_val - 1) = *src_row.add(width_val - 1) };
    }
}
```

The justification for `manual_div_ceil` being "warn" by default as part of the `complexity` group is that "It’s simpler, clearer and more readable." This, also, is not something I 100% agree with. You could argue it is preferable because it avoids overflows, but there are also cases where we know an overflow cannot occur. As such, given the above performance issues, I'd favor moving this to another group such as `pedantic` and making it "allow" by default.

### Version

```text
rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-unknown-linux-gnu
release: 1.87.0
LLVM version: 20.1.1
```

### Additional Labels

@rustbot label +A-category +S-needs-discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`manual_div_ceil` suggestion can result in worse generated code #14944

Description

Version

Additional Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

manual_div_ceil suggestion can result in worse generated code #14944

Description

Description

Version

Additional Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`manual_div_ceil` suggestion can result in worse generated code #14944