nv2a: Perspective-correct interpolation for w-buffering #1937

coldhex · 2025-02-28T22:41:45Z

There are/were existing PRs on this same subject, but I wish to present my solution here as well.

z_perspective is true implies w-buffering and then the w-coordinate stored in the depth buffer should also be interpolated in a perspective-correct way. We do this by calculating w and setting gl_FragDepth in the fragment shader.

Since enabling polygon offset and setting values using glPolygonOffset won't have any effect when manually setting gl_FragDepth for w-buffering, we introduce the depthOffset variable to obtain similar behaviour (but the glPolygonOffset factor-argument is currently not emulated.) (Note that glPolygonOffset is OpenGL implementation-dependent and it might be good to use depthOffset for z-buffering as well, but this is not done here and we still use OpenGL/Vulkan zbias functionality.)

This also implements depth clipping and clamping in the fragment shader. If triangles are clipped, the shadows of the small rocks in e.g. Halo 2 Beaver Creek map can have flickering horizontal lines. The shadows are drawn on the ground in another pass with the same models as for the ground, but for some reason with depth clamping enabled. The flickering happens if Xemu clips the ground triangles, but the exact same shadow triangles are depth clamped, so there are small differences in the coordinates. The shadows are drawn with depth function GL_EQUAL so there is no tolerance for any differences. Clipping in the fragment shader solves the problem because the ground and shadow triangles remain exactly the same regardless of depth clipping/clamping. For some performance gain, it might be a good idea to cull triangles by depth in the geometry shader, but this is not implemented here.

In the programmable vertex shader we always multiply position output by w because this improves numerical stability in subsequent floating point computations by modern GPUs. This usually means that the perspective divide done by the vertex program gets undone.

The magic bounding constants 5.42101e-020 and 1.884467e+019 are replaced by 5.421011e-20 and 1.8446744e19, i.e. more decimals added. This makes the 32-bit floating point numbers represent exactly 2^(-64) and 2^64 (raw bits 0x1f800000 and 0x5f800000) which seem more likely the correct values although testing with hardware was not done to this precision.

Testing indicates that the same RCC instruction magic constants are also applied to both fixed function and programmable vertex shader w-coordinate output. This bounding replaces the special test for w==0.0 and abs(w)==inf which used to set vtx_inv_w=1.0 (which did not match Xbox hardware behaviour.)

Fixes #55, fixes #1261

The main thing about this PR is perspective-correct interpolation for w-coordinates when w-buffering. Currently Xemu (0.8.20) interpolates w-coordinates linearly in screen space resulting in rendering artifacts. However, fixing w-values also requires fixing other things such as depth clipping (since it is done based on w-values) and I don't like to leave things too broken even if one big thing gets fixed, so I'm including related fixes here as well.

In summary, the changes are:

perspective corrected w-buffering
depth clipping and clamping
vertex shader w output coordinate bounding away from zero and infinity
programmable vertex shader output multiplication by w (i.e. undoing the perspective divide.)

I want to justify each of these changes so I'll next post test results from my nxdk_pgraph_tests dev branch (https://github.com/coldhex/nxdk_pgraph_tests/tree/dev) comparing Xbox hardware, Xemu 0.8.20 and this PR. Obviously, most of the test cases started off by copying one of @abaire's existing tests (Thanks!). To keep this description from getting too big, I'll include only results for point 1 here and then post rest in subsequent comments (and continue tomorrow if I don't have time them to post them all today.)

In tests my PC hardware is AMD Radeon RX 6600, Linux, Mesa and Vulkan unless otherwise stated. Xbox HW is v1.4.

Point 1:
W-buffering tests (coldhex/nxdk_pgraph_tests@8551623). These draw a checkerboard floor and print w-buffer values at the indicated pixels. Both fixed function and programmable vertex shader, and depth offset or not (which corresponds to glPolygonOffset() second parameter, i.e. z-bias value).

Fixed function, no depth offset:

Fixed function, with depth offset:

Programmable, no depth offset:

Programmable, with depth offset:

Note the big disagreement between HW and 0.8.20, especially with fixed function where the Xemu depth buffer values are typical for z-buffering (because 0.8.20 doesn't implement w-buffering at all in fixed function vertex shader.) The slight disagreement between HW and PR might be worrying, but this is actually due to Xbox screen coordinate rounding to 4-bit fixed point precision (for rasterization?), which is currently unimplemented in Xemu. There are existing PRs about it and I have my take on it in my dev branch (for 1x scaling.) Anyway, with the vertex rounding commit, depth buffer agreement is quite good: only four horizontal lines in the whole frame have a 1-off error. Here are the depth buffers:

Xbox HW (programmable, no depth offset):

PR (programmable, no depth offset):

Xbox vertex rounding is a subject for some other PR. The above test result from my dev branch is here just to point out that the slight error in depth buffer values in this PR is not an inherent flaw.

z_perspective is true implies w-buffering and then the w-coordinate stored in the depth buffer should also be interpolated in a perspective-correct way. We do this by calculating w and setting gl_FragDepth in the fragment shader. Since enabling polygon offset and setting values using glPolygonOffset won't have any effect when manually setting gl_FragDepth for w-buffering, we introduce the depthOffset variable to obtain similar behaviour (but the glPolygonOffset factor-argument is currently not emulated.) (Note that glPolygonOffset is OpenGL implementation-dependent and it might be good to use depthOffset for z-buffering as well, but this is not done here and we still use OpenGL/Vulkan zbias functionality.) This also implements depth clipping and clamping in the fragment shader. If triangles are clipped, the shadows of the small rocks in Halo 2 Beaver Creek map can have flickering horizontal lines. The shadows are drawn on the ground in another pass with the same models as for the ground, but for some reason with depth clamping enabled. The flickering happens if Xemu clips the ground triangles, but the exact same shadow triangles are depth clamped, so there are small differences in the coordinates. The shadows are drawn with depth function GL_EQUAL so there is no tolerance for any differences. Clipping in the fragment shader solves the problem because the ground and shadow triangles remain exactly the same regardless of depth clipping/clamping. For some performance gain, it might be a good idea to cull triangles by depth in the geometry shader, but this is not implemented here. In the programmable vertex shader we always multiply position output by w because this improves numerical stability in subsequent floating point computations by modern GPUs. This usually means that the perspective divide done by the vertex program gets undone. The magic bounding constants 5.42101e-020 and 1.884467e+019 are replaced by 5.421011e-20 and 1.8446744e19, i.e. more decimals added. This makes the 32-bit floating point numbers represent exactly 2^(-64) and 2^64 (raw bits 0x1f800000 and 0x5f800000) which seem more likely the correct values although testing with hardware was not done to this precision. Testing indicates that the same RCC instruction magic constants are also applied to both fixed function and programmable vertex shader w-coordinate output. This bounding replaces the special test for w==0.0 and abs(w)==inf which used to set vtx_inv_w=1.0 (which did not match Xbox hardware behaviour.)

Fabxx · 2025-02-28T22:52:06Z

Does this PR superseeds #1895 ?

coldhex · 2025-02-28T23:21:43Z

Point 2: depth clipping and clamping

These tests draw to two poles at fixed locations and a checkboard floor. Depth clipping and clamping is applied to the floor.

Legend:

WBuf: 1=w-buffering enabled, 0=z-buffering
Clamp: 1=Depth clamping enabled, 0=not clalmped
ZBias: depth offset value
FullR: 1=clipping at minimum and maximum depth values, 0=clipped at depth of poles.
VSH: 1=Programmable vertex shader used, 0=fixed function

As mentioned in the description, Xemu tests were done with Linux, Mesa and Vulkan. Mesa with OpenGL has the strange feature that glPolygonOffset applies twice the expected offset, but that is allowed since glPolygonOffset is implementation dependent (as mentioned also in Xemu source code.) Vulkan Mesa behaves well, so the poles in the following tests are in the correct place even with z-buffering.

coldhex · 2025-02-28T23:32:07Z

Point 2: depth clipping and clamping and why geometric depth clipping doesn't work well

The following tests first draw a checkboard floor and then draw another red floor with depth buffer equals test (i.e. NV097_SET_DEPTH_FUNC_V_EQUAL). It applies slight offset to z-coordinates, and also has tests for both z-buffering and w-buffering. This simulates what happens in Halo 2 and some shadows.

I'll also post one case with OpenGL since the checkerboard peeking from under the red looks more interesting.

coldhex · 2025-03-01T00:00:06Z

Point 3: vertex shader w output coordinate bounding away from zero and infinity

These tests are from: coldhex/nxdk_pgraph_tests@237fa30

Again, tests are for both fixed function and programmable vertex shaders and draws either quad or a square using two triangles. The top left vertex w-coordinate value and bottom right vertex coordinate value are printed. This demonstrates that Xbox clamps the output w-coordinates from both fixed function and programmable vertex shaders to the same range as is used by the RCC instruction, i.e. away from zero and away from infinity. The programmable vertex shader in these tests do not use that instruction (and just use the passthrough vertex shader from nxdk_pgraph_tests.) Note how Xemu 0.8.20 continues where Xbox and PR stop due to w-clamping.

bi-tri:

quad:
This first quad case is very interesting. Both 0.8.20 and PR output nothing. This is because the w-coordinate is infinite and fixed function vertex shader multiplies the position vector with the composite matrix containing zeros, which results in NaN values like a proper floating point implementation should. However, I've tested this more with my Xbox, and Xbox behaves as if zero times infinity is zero. I'm leaving this one unsolved in this PR and subject for future work.

next quad case:

The following quad case shows how this check in Xemu programmable vertex shader does not match hardware:

xemu/hw/xbox/nv2a/pgraph/glsl/vsh-prog.c

Lines 832 to 838 in 3bdb9e7

    
           mstring_append(body, 
        
               "  if (oPos.w == 0.0 || isinf(oPos.w)) {\n" 
        
               "    vtx_inv_w = 1.0;\n" 
        
               "  } else {\n" 
        
               "    vtx_inv_w = 1.0 / oPos.w;\n" 
        
               "  }\n" 
        
               "  vtx_inv_w_flat = vtx_inv_w;\n"

The top-right and bottom-left vertices in the quad have w=1.0 and then 0.8.20 also effectively sets w=1.0 to top-left and bottom-right vertices. This doesn't have anything do with quads vs. tris, it's just that I put w=1.0 to top-right and bottom-left for quads in my test case. Similar problem could be observed with tris as well.

The rest of the programmable quad cases look the same, so I'll just post one:

MasonT8198 · 2025-03-01T00:23:10Z

Tested: this also seems to fix some flickering skybox issues in other games, good work!

coldhex · 2025-03-01T00:52:38Z

Point 4: programmable vertex shader output multiplication by w (i.e. undoing the perspective divide.)

First I'll show the problem in Xemu 0.8.20 and Halo 2, with both Vulkan and OpenGL. This is not due to w-buffering and w-coordinate interpolation problems.

halo2_xemu0820vulkan.mp4

The same with 0.8.20 and OpenGL (loaded from snapshot.)

halo2_xemu0820opengl.mp4

I've reproduced this in these test cases: https://github.com/coldhex/nxdk_pgraph_tests/blob/237fa303092fc210b45def8529805ac5355849e3/src/tests/clipping_precision_tests.cpp

There are mainly two types of test cases, the first is relevant to games, and the second is somewhat contrived. The problem is due to large vertex coordinates and large texture coordinates and how modern GPUs apparently don't apply all the numerical stability tricks that they could to avoid it (to the extent that they would perform as well as Xbox hardware.)

The first type was produced based on coordinates similar to observed Halo 2 problems and therefore is relevant. Vertex coordinates for one vertex are large and its w-coordinate is small and varied slightly. The second case has almost the same w-coordinates for all vertices, but texture coordinates are large. Here's Xemu 0.8.20 with Vulkan:

clipping_precision_xemu0820vulkan.mp4

Here's Xbox hardware showing no trouble:

clipping_precision_xboxhw.mp4

This PR only aims to solve the first type by multiplying by w coordinate and letting PC GPU hardware do the perspective-correct interpolation of vertex attributes. Since I don't know what algorithms modern GPUs actually use it is hard to say why this works and how exactly it avoids numerical problems. The second type is not solved, because the w-coordinates in that one are about 1.0 anyway so multiplying by w doesn't really change anything.

PR:

clipping_precision_prvulkan.mp4

PR (Vulkan) and Halo 2 (loaded from snapshot):

halo2_prvulkan.mp4

The clipping precision tests in my dev nxdk_pgraph_tests branch also contain test cases where the two triangles are rotated to see if any of this depends on orientation (it doesn't). It also contains test cases that cycle the vertex coordinates (i.e. if originally vertex submission order is 0->1->2, then the first cycled test case is 1->2->0.) It is interesting that the first type of twitching is solved by cycling the coordinates appropriately. However, it is difficult to say how to do this consistently in a way that would solve it for all GPUs (or even just mine) since the algorithms and sequence of calculations GPUs do for clipping and attribute interpolation are not publicly disclosed.

One obvious fix would be to just use double-precision floating point numbers in interpolation, but that's not supported by GLSL (or GPUs I suppose.) It is possible to use double-precision floating point by interpolating manually. I have a branch experimenting with vertex attribute interpolation with e.g. barycentric coordinates without w-multiplication, but it doesn't beat the simplicity (or performance) of just simply multiplying by w: https://github.com/coldhex/xemu/tree/experimental/interpolation
(That branch doesn't use double-precision, but I have also tried that and it solves even the second (contrived) type of twitching. Also, the branch really is experimental and many things are probably broken.)

abaire · 2025-03-01T06:35:52Z

I haven't looked at the actual commit details, but this is super impressive work.

It'd be great to upstream your test branch - I've recently been throwing together some automation so we can better track xemu regressions and you've added significant coverage. I'd be happy to help update your tests to accommodate some of the infrastructure changes I've made since you forked (hopefully not much work, it might be just the registration in main.cpp that's relevant).

…1937

coldhex · 2025-03-02T09:50:51Z

Does this PR superseeds #1895 ?

That's for maintainers to decide, but I prefer my code. This is perhaps more like a code review and I have not tested any of that code, but about #1895:

With OpenGL, it compresses and then expands z-buffer depth values (when the usual z-buffering is used) resulting in loss of 1 bit of depth buffer precision contrary to the comment in the code. This is because the OpenGL NDC range is [-1,1] which OpenGL maps onto the depth range [0,1]. The PR only utilizes the subset [0,1] (appropriate for Vulkan) of the full NDC range and that subset is mapped onto [0.5, 1.0] depth range by OpenGL and so gl_FragCoord.z has lost 1 bit of precision. The PR then expands [0.5, 1.0] to [0.0, 1.0] but that one lost bit cannot be recovered.
It seems to contain an implementation of the Xbox hardware near/far plane culling feature, but does it by clipping with some unmotivated magic constant factors (only to make the current ZMinMaxControl tests come out looking right?) If you look at the hardware ZMinMaxControl test results carefully, it seems that hardware culls whole quads if (all) the z-coordinates are outside the configured depth range (if near far plane culling is enabled). Culling whole primitives cannot be done at the fragment level so a proper implementation has to be completely different. Geometry shader might be the appropriate place to handle culling.

In addition to the near/far plane culling feature, that PR also contains an implementation of the perspective-uncorrected vertex attribute interpolation feature of Xbox hardware. Both are independent hardware features and I think each should be in separate PRs. (My PR also contains more things than I would like to, but they are interconnected. For example, keeping z-buffering clipping like it is in 0.8.20 would require adding branching in gl/draw.c and vk/draw.c to enable depth clamping only when w-buffering, but then later on that branching would have to be removed anyway if the clipping precision issues demonstrated in comment #1937 (comment) were to be fixed.)

In particular, the perspective-corrected interpolation flag (bit 20 of NV097_SET_CONTROL0) is something I discovered when experimenting with hardware using nxdk_pgraph_tests and my textures were getting stretched by Xbox HW as if no perspective correction was applied. I eventually found the curious bit 20 from Xemu trace events. I don't know if it was known about before and it doesn't really matter. (The separate bundle bit location I placed at the same location I found online in the awesome "nVidia hardware documentation" by Marcelina Kościelnicka, although technically Xemu could decide to store it anywhere.) Anyway, the perspective correction bit 20 is always enabled by Direct3D and implementing the unset bit is irrelevant to game compatibility. I did my implementation originally in the halo2fixes branch coldhex@c513d9b, but I consider the feature esoteric.

The perspective correction bit 20 is relevant only in nxdk_pgraph_tests. The ff_w_zero_* tests in Golden test results: https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-W_param. The test doesn't set the bit. Xemu master/0.8.20 does two wrong things to make that test come out right. The stretching you see there is actually due to the usual texture stretching that happens when perspective correction is not applied. It's not because the coordinate w is zero. The two wrong things in 0.8.20 are: 1. the perspective-uncorrected feature is not implemented, 2. vertex shader has a special case check for w==0.0, which then sets vtx_inv_w=1.0 which effectively disables perspective correction. If the test was modified such that the perspective correction bit 20 was set, then hardware result is quite different but 0.8.20 result remains the same. Also, if, instead, the test was modified such that w was somewhat greater than zero, then hardware result would look pretty much the same as before, but 0.8.20 result would look like what hardware does when bit 20 is set.

coldhex · 2025-03-02T09:52:37Z

I haven't looked at the actual commit details, but this is super impressive work.

It'd be great to upstream your test branch - I've recently been throwing together some automation so we can better track xemu regressions and you've added significant coverage. I'd be happy to help update your tests to accommodate some of the infrastructure changes I've made since you forked (hopefully not much work, it might be just the registration in main.cpp that's relevant).

Thanks! Sure, I'll try and see if I can bring my fork up-to-date and then submit some PRs.

polymetal0 · 2025-03-03T15:12:49Z

Does this PR superseeds #1895 ?

Does this PR superseeds #1895 ?

That's for maintainers to decide, but I prefer my code. This is perhaps more like a code review and I have not tested any of that code, but about #1895:

With OpenGL, it compresses and then expands z-buffer depth values (when the usual z-buffering is used) resulting in loss of 1 bit of depth buffer precision contrary to the comment in the code. This is because the OpenGL NDC range is [-1,1] which OpenGL maps onto the depth range [0,1]. The PR only utilizes the subset [0,1] (appropriate for Vulkan) of the full NDC range and that subset is mapped onto [0.5, 1.0] depth range by OpenGL and so gl_FragCoord.z has lost 1 bit of precision. The PR then expands [0.5, 1.0] to [0.0, 1.0] but that one lost bit cannot be recovered.

It seems to contain an implementation of the Xbox hardware near/far plane culling feature, but does it by clipping with some unmotivated magic constant factors (only to make the current ZMinMaxControl tests come out looking right?) If you look at the hardware ZMinMaxControl test results carefully, it seems that hardware culls whole quads if (all) the z-coordinates are outside the configured depth range (if near far plane culling is enabled). Culling whole primitives cannot be done at the fragment level so a proper implementation has to be completely different. Geometry shader might be the appropriate place to handle culling.

In addition to the near/far plane culling feature, that PR also contains an implementation of the perspective-uncorrected vertex attribute interpolation feature of Xbox hardware. Both are independent hardware features and I think each should be in separate PRs. (My PR also contains more things than I would like to, but they are interconnected. For example, keeping z-buffering clipping like it is in 0.8.20 would require adding branching in gl/draw.c and vk/draw.c to enable depth clamping only when w-buffering, but then later on that branching would have to be removed anyway if the clipping precision issues demonstrated in comment #1937 (comment) were to be fixed.)

In particular, the perspective-corrected interpolation flag (bit 20 of NV097_SET_CONTROL0) is something I discovered when experimenting with hardware using nxdk_pgraph_tests and my textures were getting stretched by Xbox HW as if no perspective correction was applied. I eventually found the curious bit 20 from Xemu trace events. I don't know if it was known about before and it doesn't really matter. (The separate bundle bit location I placed at the same location I found online in the awesome "nVidia hardware documentation" by Marcelina Kościelnicka, although technically Xemu could decide to store it anywhere.) Anyway, the perspective correction bit 20 is always enabled by Direct3D and implementing the unset bit is irrelevant to game compatibility. I did my implementation originally in the halo2fixes branch coldhex@c513d9b, but I consider the feature esoteric.

The perspective correction bit 20 is relevant only in nxdk_pgraph_tests. The ff_w_zero_* tests in Golden test results: https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-W_param. The test doesn't set the bit. Xemu master/0.8.20 does two wrong things to make that test come out right. The stretching you see there is actually due to the usual texture stretching that happens when perspective correction is not applied. It's not because the coordinate w is zero. The two wrong things in 0.8.20 are: 1. the perspective-uncorrected feature is not implemented, 2. vertex shader has a special case check for w==0.0, which then sets vtx_inv_w=1.0 which effectively disables perspective correction. If the test was modified such that the perspective correction bit 20 was set, then hardware result is quite different but 0.8.20 result remains the same. Also, if, instead, the test was modified such that w was somewhat greater than zero, then hardware result would look pretty much the same as before, but 0.8.20 result would look like what hardware does when bit 20 is set.

Hi, so nice to see you uploaded this PR! Just to clarify, #1895 focuses on handling the ZMinMaxControl tests, which this one does not fully, so I would say they are different although they show some overlaps. I'll cut my PR down and adapt it to this one once it's merged.
Btw, I came upon the same issue you did with MS dashboard, OGL and floating point precission. In case you'd prefer to handle it that way, I solved this by doing the [0, 1] to [-1, 1] conversion in fragment shader.

Also one request, I used a texture perspective variable you implemented in your dev branch to solve some texture warping in mine. Are you planning on uploading it in a PR at some point? Thank you

abaire · 2025-03-03T15:51:24Z

Anyway, the perspective correction bit 20 is always enabled by Direct3D and implementing the unset bit is irrelevant to game compatibility

Just to clarify, #1895 focuses on handling the ZMinMaxControl tests

Arguably we should just drop/disable the tests if bit 20 is always set by real titles. Scanning through my many pgraph dumps I've only seen it unset in nxdk-based programs. I don't think it's worth carrying extra maintenance complexity just to emulate behavior that was not used in practice.

@coldhex it would be good to upstream the bit 20 setting to the nxdk nv_regs, perhaps with a comment indicating that it's always set by DirectX?

mborgerson

Wonderful! Thanks for the tests and detailed description. Will you please also confirm that other tests in the pgraph suite are not negatively impacted by the changes? Left a couple of initial review comments.

hw/xbox/nv2a/pgraph/vk/draw.c

hw/xbox/nv2a/pgraph/glsl/vsh-prog.c

GXTX · 2025-03-05T14:46:59Z

Arguably we should just drop/disable the tests if bit 20 is always set by real titles. [...] I don't think it's worth carrying extra maintenance complexity just to emulate behavior that was not used in practice.

2c: Very little skin in the game but xemu aims to be a hardware emulator, thus it should support this.

If tPosition is a zero-vector, then invViewport matrix had no effect. Bounding w-coordinate away from zero and infinity must be done before applying invViewport (which is needed for OpenGL/Vulkan) to emulate Xbox hardware behaviour properly.

coldhex · 2025-03-05T18:01:15Z

@coldhex it would be good to upstream the bit 20 setting to the nxdk nv_regs, perhaps with a comment indicating that it's always set by DirectX?

Yes, I'll do that.

coldhex · 2025-03-05T18:05:42Z

Also one request, I used a texture perspective variable you implemented in your dev branch to solve some texture warping in mine. Are you planning on uploading it in a PR at some point? Thank you

No current plans, so go ahead!

coldhex · 2025-03-05T18:26:40Z

Wonderful! Thanks for the tests and detailed description. Will you please also confirm that other tests in the pgraph suite are not negatively impacted by the changes? Left a couple of initial review comments.

I have run all the nxdk_pgraph_tests and there weren't any negative changes apart from the unset perspective correction bit 20 discussed earlier. In addition to W param tests, it also affects some of the Front face tests (https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-Front_face) which set one vertex w=Infinity. These are fixed by bit 20 implementation. (Btw, some of the tests give assert failures (also in Xemu 0.8.20), and some are unstable in Xemu in the sense rerunning them gives different output. Especially Surface_pitch/Swizzle give a different output every time I press A-key.)

I just pushed a new commit to fix a case in fixed function vertex shader. If the input tPosition is an all-zero vector, then oPos = invViewport * (tPosition * compositeMat); gives just a zero-vector also, i.e. invViewport doesn't do anything. The clamping is supposed to be done before undoing viewport changes (to get back to OpenGL/Vulkan clip-space.) This gives hardware-matching results for the quirky all-zero vertices in ZMinMaxControl tests.

CtrlFixed_ZCLAMP test (top-left is HW, top-right is new commit, bottom-left is Xemu 0.8.20, bottom-right is PR without new commit):

(Bottom-right also has the other quad reaching the center of the screen, but it is backface-culled because center position also reverses winding.)

abaire · 2025-03-06T01:19:14Z

... (Btw, some of the tests give assert failures (also in Xemu 0.8.20), and some are unstable in Xemu in the sense rerunning them gives different output. Especially Surface_pitch/Swizzle give a different output every time I press A-key.)

Those failures are expected. A few tests are non-deterministic on HW as well, though not Surface_pitch/Swizzle. I think that's just one of a number of tests that are still bugged on xemu.

Triticum0 · 2025-03-06T01:23:59Z

Fixed #1108 Lighting look like that on hardware even though PS2 and gamecube look more correct don't know what the dev did?
Source: https://youtu.be/4mO1i5nYyug?t=25

Triticum0 · 2025-03-06T03:02:52Z

Fixes #1090

Triticum0 · 2025-03-06T03:14:58Z

Fixes Most of #1016 and #1273
007: Nightfire
Battlefield 2: Modern Combat
batman begins(Shadows)
Masters:

PR:

Breakdown
Catwoman
Far Cry Instincts
Ice Age 2
Masters:

PR:

I-Ninja
Masters:

PR:

Juiced
Marvel Nemesis: Rise of the Imperfects
Project Gotham Racing
Quantum Redshift
Robots
Masters:

PR:

V-Rally

Here screenshot I took before look the same so did bother taking more screenshots but retest to make sure the issues are fixed #1826 (comment)

Untested games Which should be fixed.
Knight's Apprentice: Memorick's Adventures AMD Only
Splinter Cell Pandora Tomorrow
Tom Clancy's Rainbow Six 3 Black Arrow
Tom Clancy's Rainbow Six 3
Tom Clancy's Ghost Recon: Advanced Warfighter

Triticum0 · 2025-03-06T04:11:54Z

Improves #1143 still doesn't match hardware
Masters:

PR:

Console:
https://youtu.be/Xu5BH4j0K1A?t=120

…1937

abaire · 2025-03-07T01:45:28Z

FYI, I threw together a repo to generate diffs for PRs vs previous xemu versions. Results for this PR are here: https://github.com/abaire/xemu-dev_pgraph_test_results/wiki/pr_nv2a-wbuffering-fix (alpha func diffs should be ignored, the diff ran against 0.8.25 which has an additional fix)

Xbox rounds -0.0 to the negative range and 0.0 to the positive range. This commit also restores RCC instruction clamping to be done on the output of reciprocal calculation (which current Xemu release does) with fix for the input=Infinity case.

coldhex · 2025-03-07T23:47:07Z

Following @mborgerson initial review comments, I fixed the -0.0 case when clamping away from zero and infinity.

I restored RCC instruction clamping to how it is in Xemu release, except for the more precise bounding constants, and the new fix for the case when input is Infinity. I've added and updated W param tests here: coldhex/nxdk_pgraph_tests@89e2b67

In the following images, top-left is Xbox hardware, top-right is PR with new commit, bottom-left is Xemu 0.8.20 and bottom-right is PR without new commit.

New commit fixes RCC instruction when input=Infinity. This test uses a vertex program that divides x and y by the z shown in the image.

Note that with 0.8.20 the bottom-right vertex coordinates flip to negative, because 1.0/Infinity = 0.0 and this does not match the t > 0.0 condition in 0.8.20.

The same clamping behaviour regarding 0.0 and -0.0 must be used for vertex shader W-coordinate output:

(Because w is negative (i.e. behind camera), the screen coordinates for top-left and bottom-right must be swapped in the test.) Note how without new commit w=-0.0 is treated as if it is 0.0 (and therefore is clamping the positive range.)

I also found some more precision issues regarding large coordinates (in addition to the texture twitching in previous comments.) Here is a progression of top-left coordinate towards -0.0 using Radeon and Vulkan:

This sequence converges and eventually starts looking like this:

This is where my Radeon's (with Mesa and Vulkan) numerical precision starts affecting the result (before reaching the clamping range limits):

The point of breaking above depends on hardware. My Intel integrated (UHD 770) breaks sooner. I tried these tests with my experimental/interpolation branch (with the -0.0 fix) with custom clipping enabled and it seems to fix the problem. So the breaking here is due to limited accuracy of 32-bit floats (and perhaps less than optimal numerical stability of algorithms used by my GPUs). The test above is certainly contrived in the same sense that the second texture twitching case with large texture coordinates is. Problems resulting from huge coordinate differences is something that developers generally try tp avoid because of exactly these kinds of precision issues. However, I just mention this because Xbox hardware doesn't seem to have such precision issues here (and possibly the magic bounding constants 2^(-64) and 2^64 were chosen by Nvidia so that their algorithms work.)

mborgerson · 2025-03-08T00:58:52Z

hw/xbox/nv2a/pgraph/glsl/vsh.c

+        "\n"
+        "float clampAwayZeroInf(float t) {\n"
+        "  if (t > 0.0 || floatBitsToUint(t) == 0) {\n"
+        "    t = clamp(t, 5.421011e-20, 1.8446744e19);\n"


I wonder if we should use uintBitsToFloat to supply the specific encoding of these values (0x1F800000, 0x5F800000) to avoid a chance the glsl compiler picks a different representation. Since we are trying to encode these exact values anyway, it makes sense. What do you think?

Yes, makes sense. When using uintBitsToFloat it compiles to the exact same code on my system, but source code is cleaner that way.

mborgerson · 2025-03-08T01:29:47Z

FYI, I threw together a repo to generate diffs for PRs vs previous xemu versions. Results for this PR are here: https://github.com/abaire/xemu-dev_pgraph_test_results/wiki/pr_nv2a-wbuffering-fix (alpha func diffs should be ignored, the diff ran against 0.8.25 which has an additional fix)

Thanks @abaire! I really like having all of these tests results, but the volume of results is a bit difficult to navigate and review in this format. Is there a table of sorts where I can see which tests were impacted by the change and how it compares to xemu master and xbox hardware?

mborgerson · 2025-03-08T01:41:59Z

@coldhex

New commit fixes RCC instruction when input=Infinity. This test uses a vertex program that divides x and y by the z shown in the image.

Thanks for the additional tests and PR improvements. I'm having a hard time seeing a difference between the TR and BR quadrants here, so I'm not exactly sure what the result is trying to demonstrate in PR improvement in the first image.

mborgerson · 2025-03-08T01:46:23Z

hw/xbox/nv2a/pgraph/glsl/geom.c

-                       // "  gl_ClipDistance[0] = gl_in[index].gl_ClipDistance[0];\n"
-                       // "  gl_ClipDistance[1] = gl_in[index].gl_ClipDistance[1];\n"
-                       "  vtx_inv_w = v_vtx_inv_w[index];\n"
-                       "  vtx_inv_w_flat = v_vtx_inv_w[provoking_index];\n"


I'm interested to see how these changes impact flat shading. Looks like this was added in #1180 by @abaire. @abaire: Thoughts?

Looking at the results comparison, it seems it's no worse than it already was. There are still differences against HW, but it looks like they haven't changed since 0.7.98 which is the oldest version I happen to have generated goldens for. I double checked the raw results manually to rule out a bug in the regression comparison script.

Maybe @Triticum0 could quickly check Blinx shadows to double check that there's no regression?

I removed vtx_inv_w_flat because, in case of flat shading, it together with D0, D1, B0, B1 are all non-interpolated vertex attributes and multiplying in vertex shader and then dividing in fragment shader by the exact same value is superfluous. So, this is just simplification of #1180 and it makes sense to remove vtx_inv_w_flat together at the same time when vtx_inv_w is removed.

Test blinks and couldn't see any regression with textures being missing

abaire · 2025-03-08T02:20:04Z

FYI, I threw together a repo to generate diffs for PRs vs previous xemu versions. Results for this PR are here: https://github.com/abaire/xemu-dev_pgraph_test_results/wiki/pr_nv2a-wbuffering-fix (alpha func diffs should be ignored, the diff ran against 0.8.25 which has an additional fix)

Thanks @abaire! I really like having all of these tests results, but the volume of results is a bit difficult to navigate and review in this format. Is there a table of sorts where I can see which tests were impacted by the change and how it compares to xemu master and xbox hardware?

Unfortunately it's just the TOC in the right panel. I'm planning to generate Pages output like https://abaire.github.io/xemu-nxdk_pgraph_tests_results/compare/xemu-0.8.25-master-046d127f31bad089decc2e213dc8488c43aa6f04/Linux_x86_64/gl_NVIDIA_Corporation_NVIDIA_GeForce_GTX_1070-PCIe-SSE2%3Agslv_4.00_NVIDIA_via_Cg_compiler/index.html it was just faster to throw it in the wiki as a proof of concept.

MasonT8198 · 2025-03-08T02:32:19Z

This fixes the shadow flickering and skybox flickering in The Incredible Hulk: Ultimate Destruction, pushing the game to near perfect status, all that's left is to fix the 0.5 issue with the window scaling (ie the lines in the left and top sides)

coldhex · 2025-03-08T17:13:45Z

@coldhex

New commit fixes RCC instruction when input=Infinity. This test uses a vertex program that divides x and y by the z shown in the image.

Thanks for the additional tests and PR improvements. I'm having a hard time seeing a difference between the TR and BR quadrants here, so I'm not exactly sure what the result is trying to demonstrate in PR improvement in the first image.

TR and BR are the same since Infinity as input to RCC wasn't a problem in the PR. That image demonstrated that the new commit still fixes Xemu 0.8.20 RCC instruction. The case that is fixed by the new commit is -0.0 as input to RCC. I didn't include the test result in that comment, but here it is:

Bottom-right quadrant is the initial commit behaviour, which didn't take into account the minus in -0.0 so the top-left vertex flips to negative.

…1937

mborgerson · 2025-03-08T21:54:13Z

Great, thanks!

Triticum0 · 2025-03-21T00:31:13Z

I Think this pr break Shenmue 2 Sky box #2049 may want to look into when you have the time. @coldhex

Xcedf pushed a commit to Xcedf/xemu that referenced this pull request Mar 1, 2025

nv2a: Perspective-correct interpolation for w-buffering xemu-project#…

456b335

…1937

cazzhmir mentioned this pull request Mar 3, 2025

Splinter Cell: Dynamic lights and shadows display on everything but walls/floors/ceilings #941

Closed

mborgerson reviewed Mar 4, 2025

View reviewed changes

hw/xbox/nv2a/pgraph/vk/draw.c Show resolved Hide resolved

hw/xbox/nv2a/pgraph/glsl/vsh-prog.c Outdated Show resolved Hide resolved

Xcedf pushed a commit to Xcedf/xemu that referenced this pull request Mar 6, 2025

nv2a: Perspective-correct interpolation for w-buffering xemu-project#…

737ebe0

…1937

Triticum0 mentioned this pull request Mar 7, 2025

Far Cry series: Broken water rendering #1931

Open

mborgerson reviewed Mar 8, 2025

View reviewed changes

nv2a: Explicit float representation for RCC and vertex shader W range

dd308df

Xcedf pushed a commit to Xcedf/xemu that referenced this pull request Mar 8, 2025

nv2a: Perspective-correct interpolation for w-buffering xemu-project#…

06b8983

…1937

mborgerson merged commit 3eb22b6 into xemu-project:master Mar 8, 2025
21 checks passed

This was referenced Mar 8, 2025

Splinter cell: NIght vision goggles broken and blurry #803

Closed

Splinter cell: Character and environment not rendered properly #804

Closed

Xcedf mentioned this pull request Mar 10, 2025

Conker Live and Reloaded: Missing Character shadows on versions above 0.7.115 #1685

Closed

This was referenced Mar 12, 2025

Splinter Cell 1/Pandora Tomorrow: Flickering shadows #1971

Closed

nv2a: Handle SET_SPECULAR_ENABLE #2004

Merged

Triticum0 mentioned this pull request Apr 18, 2025

Counter Terrorist Special Forces: loading animation corrupts title screen #665

Open

Uh oh!

nv2a: Perspective-correct interpolation for w-buffering #1937

nv2a: Perspective-correct interpolation for w-buffering #1937

Uh oh!

Conversation

coldhex commented Feb 28, 2025

Uh oh!

Fabxx commented Feb 28, 2025

Uh oh!

coldhex commented Feb 28, 2025

Uh oh!

coldhex commented Feb 28, 2025

Uh oh!

coldhex commented Mar 1, 2025

Uh oh!

MasonT8198 commented Mar 1, 2025

Uh oh!

coldhex commented Mar 1, 2025

Uh oh!

abaire commented Mar 1, 2025

Uh oh!

coldhex commented Mar 2, 2025

Uh oh!

coldhex commented Mar 2, 2025

Uh oh!

polymetal0 commented Mar 3, 2025

Uh oh!

abaire commented Mar 3, 2025

Uh oh!

mborgerson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GXTX commented Mar 5, 2025

Uh oh!

coldhex commented Mar 5, 2025

Uh oh!

coldhex commented Mar 5, 2025

Uh oh!

coldhex commented Mar 5, 2025

Uh oh!

abaire commented Mar 6, 2025

Uh oh!

Triticum0 commented Mar 6, 2025

Uh oh!

Triticum0 commented Mar 6, 2025

Uh oh!

Triticum0 commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Triticum0 commented Mar 6, 2025

Uh oh!

abaire commented Mar 7, 2025

Uh oh!

coldhex commented Mar 7, 2025

Uh oh!

mborgerson Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coldhex Mar 8, 2025

Choose a reason for hiding this comment

Uh oh!

mborgerson commented Mar 8, 2025

Uh oh!

mborgerson commented Mar 8, 2025

Uh oh!

mborgerson Mar 8, 2025

Choose a reason for hiding this comment

Uh oh!

abaire Mar 8, 2025

Choose a reason for hiding this comment

Uh oh!

coldhex Mar 8, 2025

Choose a reason for hiding this comment

Uh oh!

Triticum0 Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Triticum0 commented Mar 6, 2025 •

edited

Loading

mborgerson Mar 8, 2025 •

edited

Loading