Skip to content

Conversation

coldhex
Copy link

@coldhex coldhex commented Feb 28, 2025

There are/were existing PRs on this same subject, but I wish to present my solution here as well.

z_perspective is true implies w-buffering and then the w-coordinate stored in the depth buffer should also be interpolated in a perspective-correct way. We do this by calculating w and setting gl_FragDepth in the fragment shader.

Since enabling polygon offset and setting values using glPolygonOffset won't have any effect when manually setting gl_FragDepth for w-buffering, we introduce the depthOffset variable to obtain similar behaviour (but the glPolygonOffset factor-argument is currently not emulated.) (Note that glPolygonOffset is OpenGL implementation-dependent and it might be good to use depthOffset for z-buffering as well, but this is not done here and we still use OpenGL/Vulkan zbias functionality.)

This also implements depth clipping and clamping in the fragment shader. If triangles are clipped, the shadows of the small rocks in e.g. Halo 2 Beaver Creek map can have flickering horizontal lines. The shadows are drawn on the ground in another pass with the same models as for the ground, but for some reason with depth clamping enabled. The flickering happens if Xemu clips the ground triangles, but the exact same shadow triangles are depth clamped, so there are small differences in the coordinates. The shadows are drawn with depth function GL_EQUAL so there is no tolerance for any differences. Clipping in the fragment shader solves the problem because the ground and shadow triangles remain exactly the same regardless of depth clipping/clamping. For some performance gain, it might be a good idea to cull triangles by depth in the geometry shader, but this is not implemented here.

In the programmable vertex shader we always multiply position output by w because this improves numerical stability in subsequent floating point computations by modern GPUs. This usually means that the perspective divide done by the vertex program gets undone.

The magic bounding constants 5.42101e-020 and 1.884467e+019 are replaced by 5.421011e-20 and 1.8446744e19, i.e. more decimals added. This makes the 32-bit floating point numbers represent exactly 2^(-64) and 2^64 (raw bits 0x1f800000 and 0x5f800000) which seem more likely the correct values although testing with hardware was not done to this precision.

Testing indicates that the same RCC instruction magic constants are also applied to both fixed function and programmable vertex shader w-coordinate output. This bounding replaces the special test for w==0.0 and abs(w)==inf which used to set vtx_inv_w=1.0 (which did not match Xbox hardware behaviour.)

Fixes #55, fixes #1261

The main thing about this PR is perspective-correct interpolation for w-coordinates when w-buffering. Currently Xemu (0.8.20) interpolates w-coordinates linearly in screen space resulting in rendering artifacts. However, fixing w-values also requires fixing other things such as depth clipping (since it is done based on w-values) and I don't like to leave things too broken even if one big thing gets fixed, so I'm including related fixes here as well.

In summary, the changes are:

  1. perspective corrected w-buffering
  2. depth clipping and clamping
  3. vertex shader w output coordinate bounding away from zero and infinity
  4. programmable vertex shader output multiplication by w (i.e. undoing the perspective divide.)

I want to justify each of these changes so I'll next post test results from my nxdk_pgraph_tests dev branch (https://github.com/coldhex/nxdk_pgraph_tests/tree/dev) comparing Xbox hardware, Xemu 0.8.20 and this PR. Obviously, most of the test cases started off by copying one of @abaire's existing tests (Thanks!). To keep this description from getting too big, I'll include only results for point 1 here and then post rest in subsequent comments (and continue tomorrow if I don't have time them to post them all today.)

In tests my PC hardware is AMD Radeon RX 6600, Linux, Mesa and Vulkan unless otherwise stated. Xbox HW is v1.4.

Point 1:
W-buffering tests (coldhex/nxdk_pgraph_tests@8551623). These draw a checkerboard floor and print w-buffer values at the indicated pixels. Both fixed function and programmable vertex shader, and depth offset or not (which corresponds to glPolygonOffset() second parameter, i.e. z-bias value).

Fixed function, no depth offset:
wbuf_VSH0_ZB0

Fixed function, with depth offset:
wbuf_VSH0_ZB1

Programmable, no depth offset:
wbuf_VSH1_ZB0

Programmable, with depth offset:
wbuf_VSH1_ZB1

Note the big disagreement between HW and 0.8.20, especially with fixed function where the Xemu depth buffer values are typical for z-buffering (because 0.8.20 doesn't implement w-buffering at all in fixed function vertex shader.) The slight disagreement between HW and PR might be worrying, but this is actually due to Xbox screen coordinate rounding to 4-bit fixed point precision (for rasterization?), which is currently unimplemented in Xemu. There are existing PRs about it and I have my take on it in my dev branch (for 1x scaling.) Anyway, with the vertex rounding commit, depth buffer agreement is quite good: only four horizontal lines in the whole frame have a 1-off error. Here are the depth buffers:

Xbox HW (programmable, no depth offset):
wbuf_VSH1_ZB0 zbuf xboxhw

PR (programmable, no depth offset):
wbuf_VSH1_ZB0 zbuf xemudevvulkan

Xbox vertex rounding is a subject for some other PR. The above test result from my dev branch is here just to point out that the slight error in depth buffer values in this PR is not an inherent flaw.

z_perspective is true implies w-buffering and then the w-coordinate stored
in the depth buffer should also be interpolated in a perspective-correct
way. We do this by calculating w and setting gl_FragDepth in the fragment
shader.

Since enabling polygon offset and setting values using glPolygonOffset
won't have any effect when manually setting gl_FragDepth for w-buffering,
we introduce the depthOffset variable to obtain similar behaviour (but the
glPolygonOffset factor-argument is currently not emulated.) (Note that
glPolygonOffset is OpenGL implementation-dependent and it might be good to
use depthOffset for z-buffering as well, but this is not done here and we
still use OpenGL/Vulkan zbias functionality.)

This also implements depth clipping and clamping in the fragment shader.
If triangles are clipped, the shadows of the small rocks in Halo 2 Beaver
Creek map can have flickering horizontal lines. The shadows are drawn on
the ground in another pass with the same models as for the ground, but for
some reason with depth clamping enabled. The flickering happens if Xemu
clips the ground triangles, but the exact same shadow triangles are depth
clamped, so there are small differences in the coordinates. The shadows
are drawn with depth function GL_EQUAL so there is no tolerance for any
differences. Clipping in the fragment shader solves the problem because
the ground and shadow triangles remain exactly the same regardless of
depth clipping/clamping. For some performance gain, it might be a good
idea to cull triangles by depth in the geometry shader, but this is not
implemented here.

In the programmable vertex shader we always multiply position output by w
because this improves numerical stability in subsequent floating point
computations by modern GPUs. This usually means that the perspective
divide done by the vertex program gets undone.

The magic bounding constants 5.42101e-020 and 1.884467e+019 are replaced
by 5.421011e-20 and 1.8446744e19, i.e. more decimals added. This makes the
32-bit floating point numbers represent exactly 2^(-64) and 2^64 (raw bits
0x1f800000 and 0x5f800000) which seem more likely the correct values
although testing with hardware was not done to this precision.

Testing indicates that the same RCC instruction magic constants are also
applied to both fixed function and programmable vertex shader w-coordinate
output. This bounding replaces the special test for w==0.0 and abs(w)==inf
which used to set vtx_inv_w=1.0 (which did not match Xbox hardware
behaviour.)
@Fabxx
Copy link
Contributor

Fabxx commented Feb 28, 2025

Does this PR superseeds #1895 ?

@coldhex
Copy link
Author

coldhex commented Feb 28, 2025

Point 2: depth clipping and clamping

These tests draw to two poles at fixed locations and a checkboard floor. Depth clipping and clamping is applied to the floor.

Legend:

  • WBuf: 1=w-buffering enabled, 0=z-buffering
  • Clamp: 1=Depth clamping enabled, 0=not clalmped
  • ZBias: depth offset value
  • FullR: 1=clipping at minimum and maximum depth values, 0=clipped at depth of poles.
  • VSH: 1=Programmable vertex shader used, 0=fixed function

As mentioned in the description, Xemu tests were done with Linux, Mesa and Vulkan. Mesa with OpenGL has the strange feature that glPolygonOffset applies twice the expected offset, but that is allowed since glPolygonOffset is implementation dependent (as mentioned also in Xemu source code.) Vulkan Mesa behaves well, so the poles in the following tests are in the correct place even with z-buffering.

depth_clamp_W0_C0_ZB0_FR0_VSH0

depth_clamp_W0_C0_ZB0_FR0_VSH1

depth_clamp_W0_C0_ZB0_FR1_VSH0

depth_clamp_W0_C0_ZB0_FR1_VSH1

depth_clamp_W0_C0_ZB1_FR0_VSH0

depth_clamp_W0_C0_ZB1_FR0_VSH1

depth_clamp_W0_C0_ZB1_FR1_VSH0

depth_clamp_W0_C0_ZB1_FR1_VSH1

depth_clamp_W0_C1_ZB0_FR0_VSH0

depth_clamp_W0_C1_ZB0_FR0_VSH1

depth_clamp_W0_C1_ZB0_FR1_VSH0

depth_clamp_W0_C1_ZB0_FR1_VSH1

depth_clamp_W0_C1_ZB1_FR0_VSH0

depth_clamp_W0_C1_ZB1_FR0_VSH1

depth_clamp_W0_C1_ZB1_FR1_VSH0

depth_clamp_W0_C1_ZB1_FR1_VSH1

depth_clamp_W1_C0_ZB0_FR0_VSH0

depth_clamp_W1_C0_ZB0_FR0_VSH1

depth_clamp_W1_C0_ZB0_FR1_VSH0

depth_clamp_W1_C0_ZB0_FR1_VSH1

depth_clamp_W1_C0_ZB1_FR0_VSH0
depth_clamp_W1_C0_ZB1_FR0_VSH1

depth_clamp_W1_C0_ZB1_FR1_VSH0

depth_clamp_W1_C0_ZB1_FR1_VSH1

depth_clamp_W1_C1_ZB0_FR0_VSH0

depth_clamp_W1_C1_ZB0_FR0_VSH1

depth_clamp_W1_C1_ZB0_FR1_VSH0

depth_clamp_W1_C1_ZB0_FR1_VSH1
depth_clamp_W1_C1_ZB1_FR0_VSH0

depth_clamp_W1_C1_ZB1_FR0_VSH1

depth_clamp_W1_C1_ZB1_FR1_VSH0

depth_clamp_W1_C1_ZB1_FR1_VSH1

@coldhex
Copy link
Author

coldhex commented Feb 28, 2025

Point 2: depth clipping and clamping and why geometric depth clipping doesn't work well

The following tests first draw a checkboard floor and then draw another red floor with depth buffer equals test (i.e. NV097_SET_DEPTH_FUNC_V_EQUAL). It applies slight offset to z-coordinates, and also has tests for both z-buffering and w-buffering. This simulates what happens in Halo 2 and some shadows.

depth_equal_Ofs0 685

depth_equal_Ofs0 822

depth_equal_Ofs3 015

depth_equal_Ofs3 216

depth_equal_WBuf_Ofs0 685

depth_equal_WBuf_Ofs0 822

depth_equal_WBuf_Ofs3 015

depth_equal_WBuf_Ofs3 216

I'll also post one case with OpenGL since the checkerboard peeking from under the red looks more interesting.

depth_equal_Ofs3 216

@coldhex
Copy link
Author

coldhex commented Mar 1, 2025

Point 3: vertex shader w output coordinate bounding away from zero and infinity

These tests are from: coldhex/nxdk_pgraph_tests@237fa30

Again, tests are for both fixed function and programmable vertex shaders and draws either quad or a square using two triangles. The top left vertex w-coordinate value and bottom right vertex coordinate value are printed. This demonstrates that Xbox clamps the output w-coordinates from both fixed function and programmable vertex shaders to the same range as is used by the RCC instruction, i.e. away from zero and away from infinity. The programmable vertex shader in these tests do not use that instruction (and just use the passthrough vertex shader from nxdk_pgraph_tests.) Note how Xemu 0.8.20 continues where Xbox and PR stop due to w-clamping.

bi-tri:
ff_w_zero_inf__bitri_w0 00

bi-tri:
ff_w_zero_inf__bitri_w0 25

bi-tri:
ff_w_zero_inf__bitri_w0 50

bi-tri:
ff_w_zero_inf__bitri_w1 00

bi-tri:
ff_w_zero_inf__bitri_w2 00

bi-tri:
ff_w_zero_inf__bitri_w4 00

bi-tri:
prog_w_zero_inf__bitri_w0 00

bi-tri:
prog_w_zero_inf__bitri_w0 25

bi-tri:
prog_w_zero_inf__bitri_w0 50

bi-tri:
prog_w_zero_inf__bitri_w1 00

bi-tri:
prog_w_zero_inf__bitri_w2 00

bi-tri:
prog_w_zero_inf__bitri_w4 00

quad:
This first quad case is very interesting. Both 0.8.20 and PR output nothing. This is because the w-coordinate is infinite and fixed function vertex shader multiplies the position vector with the composite matrix containing zeros, which results in NaN values like a proper floating point implementation should. However, I've tested this more with my Xbox, and Xbox behaves as if zero times infinity is zero. I'm leaving this one unsolved in this PR and subject for future work.

ff_w_zero_inf__quad_w0 00

next quad case:

ff_w_zero_inf__quad_w0 25

ff_w_zero_inf__quad_w0 50

ff_w_zero_inf__quad_w1 00

ff_w_zero_inf__quad_w2 00

ff_w_zero_inf__quad_w4 00

The following quad case shows how this check in Xemu programmable vertex shader does not match hardware:

mstring_append(body,
" if (oPos.w == 0.0 || isinf(oPos.w)) {\n"
" vtx_inv_w = 1.0;\n"
" } else {\n"
" vtx_inv_w = 1.0 / oPos.w;\n"
" }\n"
" vtx_inv_w_flat = vtx_inv_w;\n"

The top-right and bottom-left vertices in the quad have w=1.0 and then 0.8.20 also effectively sets w=1.0 to top-left and bottom-right vertices. This doesn't have anything do with quads vs. tris, it's just that I put w=1.0 to top-right and bottom-left for quads in my test case. Similar problem could be observed with tris as well.

prog_w_zero_inf__quad_w0 00

The rest of the programmable quad cases look the same, so I'll just post one:

prog_w_zero_inf__quad_w0 25

@MasonT8198
Copy link
Contributor

Tested: this also seems to fix some flickering skybox issues in other games, good work!

@coldhex
Copy link
Author

coldhex commented Mar 1, 2025

Point 4: programmable vertex shader output multiplication by w (i.e. undoing the perspective divide.)

First I'll show the problem in Xemu 0.8.20 and Halo 2, with both Vulkan and OpenGL. This is not due to w-buffering and w-coordinate interpolation problems.

halo2_xemu0820vulkan.mp4

The same with 0.8.20 and OpenGL (loaded from snapshot.)

halo2_xemu0820opengl.mp4

I've reproduced this in these test cases: https://github.com/coldhex/nxdk_pgraph_tests/blob/237fa303092fc210b45def8529805ac5355849e3/src/tests/clipping_precision_tests.cpp

There are mainly two types of test cases, the first is relevant to games, and the second is somewhat contrived. The problem is due to large vertex coordinates and large texture coordinates and how modern GPUs apparently don't apply all the numerical stability tricks that they could to avoid it (to the extent that they would perform as well as Xbox hardware.)

The first type was produced based on coordinates similar to observed Halo 2 problems and therefore is relevant. Vertex coordinates for one vertex are large and its w-coordinate is small and varied slightly. The second case has almost the same w-coordinates for all vertices, but texture coordinates are large. Here's Xemu 0.8.20 with Vulkan:

clipping_precision_xemu0820vulkan.mp4

Here's Xbox hardware showing no trouble:

clipping_precision_xboxhw.mp4

This PR only aims to solve the first type by multiplying by w coordinate and letting PC GPU hardware do the perspective-correct interpolation of vertex attributes. Since I don't know what algorithms modern GPUs actually use it is hard to say why this works and how exactly it avoids numerical problems. The second type is not solved, because the w-coordinates in that one are about 1.0 anyway so multiplying by w doesn't really change anything.

PR:

clipping_precision_prvulkan.mp4

PR (Vulkan) and Halo 2 (loaded from snapshot):

halo2_prvulkan.mp4

The clipping precision tests in my dev nxdk_pgraph_tests branch also contain test cases where the two triangles are rotated to see if any of this depends on orientation (it doesn't). It also contains test cases that cycle the vertex coordinates (i.e. if originally vertex submission order is 0->1->2, then the first cycled test case is 1->2->0.) It is interesting that the first type of twitching is solved by cycling the coordinates appropriately. However, it is difficult to say how to do this consistently in a way that would solve it for all GPUs (or even just mine) since the algorithms and sequence of calculations GPUs do for clipping and attribute interpolation are not publicly disclosed.

One obvious fix would be to just use double-precision floating point numbers in interpolation, but that's not supported by GLSL (or GPUs I suppose.) It is possible to use double-precision floating point by interpolating manually. I have a branch experimenting with vertex attribute interpolation with e.g. barycentric coordinates without w-multiplication, but it doesn't beat the simplicity (or performance) of just simply multiplying by w: https://github.com/coldhex/xemu/tree/experimental/interpolation
(That branch doesn't use double-precision, but I have also tried that and it solves even the second (contrived) type of twitching. Also, the branch really is experimental and many things are probably broken.)

@abaire
Copy link
Contributor

abaire commented Mar 1, 2025

I haven't looked at the actual commit details, but this is super impressive work.

It'd be great to upstream your test branch - I've recently been throwing together some automation so we can better track xemu regressions and you've added significant coverage. I'd be happy to help update your tests to accommodate some of the infrastructure changes I've made since you forked (hopefully not much work, it might be just the registration in main.cpp that's relevant).

Xcedf pushed a commit to Xcedf/xemu that referenced this pull request Mar 1, 2025
@coldhex
Copy link
Author

coldhex commented Mar 2, 2025

Does this PR superseeds #1895 ?

That's for maintainers to decide, but I prefer my code. This is perhaps more like a code review and I have not tested any of that code, but about #1895:

  1. With OpenGL, it compresses and then expands z-buffer depth values (when the usual z-buffering is used) resulting in loss of 1 bit of depth buffer precision contrary to the comment in the code. This is because the OpenGL NDC range is [-1,1] which OpenGL maps onto the depth range [0,1]. The PR only utilizes the subset [0,1] (appropriate for Vulkan) of the full NDC range and that subset is mapped onto [0.5, 1.0] depth range by OpenGL and so gl_FragCoord.z has lost 1 bit of precision. The PR then expands [0.5, 1.0] to [0.0, 1.0] but that one lost bit cannot be recovered.
  2. It seems to contain an implementation of the Xbox hardware near/far plane culling feature, but does it by clipping with some unmotivated magic constant factors (only to make the current ZMinMaxControl tests come out looking right?) If you look at the hardware ZMinMaxControl test results carefully, it seems that hardware culls whole quads if (all) the z-coordinates are outside the configured depth range (if near far plane culling is enabled). Culling whole primitives cannot be done at the fragment level so a proper implementation has to be completely different. Geometry shader might be the appropriate place to handle culling.

In addition to the near/far plane culling feature, that PR also contains an implementation of the perspective-uncorrected vertex attribute interpolation feature of Xbox hardware. Both are independent hardware features and I think each should be in separate PRs. (My PR also contains more things than I would like to, but they are interconnected. For example, keeping z-buffering clipping like it is in 0.8.20 would require adding branching in gl/draw.c and vk/draw.c to enable depth clamping only when w-buffering, but then later on that branching would have to be removed anyway if the clipping precision issues demonstrated in comment #1937 (comment) were to be fixed.)

In particular, the perspective-corrected interpolation flag (bit 20 of NV097_SET_CONTROL0) is something I discovered when experimenting with hardware using nxdk_pgraph_tests and my textures were getting stretched by Xbox HW as if no perspective correction was applied. I eventually found the curious bit 20 from Xemu trace events. I don't know if it was known about before and it doesn't really matter. (The separate bundle bit location I placed at the same location I found online in the awesome "nVidia hardware documentation" by Marcelina Kościelnicka, although technically Xemu could decide to store it anywhere.) Anyway, the perspective correction bit 20 is always enabled by Direct3D and implementing the unset bit is irrelevant to game compatibility. I did my implementation originally in the halo2fixes branch coldhex@c513d9b, but I consider the feature esoteric.

The perspective correction bit 20 is relevant only in nxdk_pgraph_tests. The ff_w_zero_* tests in Golden test results: https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-W_param. The test doesn't set the bit. Xemu master/0.8.20 does two wrong things to make that test come out right. The stretching you see there is actually due to the usual texture stretching that happens when perspective correction is not applied. It's not because the coordinate w is zero. The two wrong things in 0.8.20 are: 1. the perspective-uncorrected feature is not implemented, 2. vertex shader has a special case check for w==0.0, which then sets vtx_inv_w=1.0 which effectively disables perspective correction. If the test was modified such that the perspective correction bit 20 was set, then hardware result is quite different but 0.8.20 result remains the same. Also, if, instead, the test was modified such that w was somewhat greater than zero, then hardware result would look pretty much the same as before, but 0.8.20 result would look like what hardware does when bit 20 is set.

@coldhex
Copy link
Author

coldhex commented Mar 2, 2025

I haven't looked at the actual commit details, but this is super impressive work.

It'd be great to upstream your test branch - I've recently been throwing together some automation so we can better track xemu regressions and you've added significant coverage. I'd be happy to help update your tests to accommodate some of the infrastructure changes I've made since you forked (hopefully not much work, it might be just the registration in main.cpp that's relevant).

Thanks! Sure, I'll try and see if I can bring my fork up-to-date and then submit some PRs.

@polymetal0
Copy link

Does this PR superseeds #1895 ?

Does this PR superseeds #1895 ?

That's for maintainers to decide, but I prefer my code. This is perhaps more like a code review and I have not tested any of that code, but about #1895:

  1. With OpenGL, it compresses and then expands z-buffer depth values (when the usual z-buffering is used) resulting in loss of 1 bit of depth buffer precision contrary to the comment in the code. This is because the OpenGL NDC range is [-1,1] which OpenGL maps onto the depth range [0,1]. The PR only utilizes the subset [0,1] (appropriate for Vulkan) of the full NDC range and that subset is mapped onto [0.5, 1.0] depth range by OpenGL and so gl_FragCoord.z has lost 1 bit of precision. The PR then expands [0.5, 1.0] to [0.0, 1.0] but that one lost bit cannot be recovered.
  2. It seems to contain an implementation of the Xbox hardware near/far plane culling feature, but does it by clipping with some unmotivated magic constant factors (only to make the current ZMinMaxControl tests come out looking right?) If you look at the hardware ZMinMaxControl test results carefully, it seems that hardware culls whole quads if (all) the z-coordinates are outside the configured depth range (if near far plane culling is enabled). Culling whole primitives cannot be done at the fragment level so a proper implementation has to be completely different. Geometry shader might be the appropriate place to handle culling.

In addition to the near/far plane culling feature, that PR also contains an implementation of the perspective-uncorrected vertex attribute interpolation feature of Xbox hardware. Both are independent hardware features and I think each should be in separate PRs. (My PR also contains more things than I would like to, but they are interconnected. For example, keeping z-buffering clipping like it is in 0.8.20 would require adding branching in gl/draw.c and vk/draw.c to enable depth clamping only when w-buffering, but then later on that branching would have to be removed anyway if the clipping precision issues demonstrated in comment #1937 (comment) were to be fixed.)

In particular, the perspective-corrected interpolation flag (bit 20 of NV097_SET_CONTROL0) is something I discovered when experimenting with hardware using nxdk_pgraph_tests and my textures were getting stretched by Xbox HW as if no perspective correction was applied. I eventually found the curious bit 20 from Xemu trace events. I don't know if it was known about before and it doesn't really matter. (The separate bundle bit location I placed at the same location I found online in the awesome "nVidia hardware documentation" by Marcelina Kościelnicka, although technically Xemu could decide to store it anywhere.) Anyway, the perspective correction bit 20 is always enabled by Direct3D and implementing the unset bit is irrelevant to game compatibility. I did my implementation originally in the halo2fixes branch coldhex@c513d9b, but I consider the feature esoteric.

The perspective correction bit 20 is relevant only in nxdk_pgraph_tests. The ff_w_zero_* tests in Golden test results: https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-W_param. The test doesn't set the bit. Xemu master/0.8.20 does two wrong things to make that test come out right. The stretching you see there is actually due to the usual texture stretching that happens when perspective correction is not applied. It's not because the coordinate w is zero. The two wrong things in 0.8.20 are: 1. the perspective-uncorrected feature is not implemented, 2. vertex shader has a special case check for w==0.0, which then sets vtx_inv_w=1.0 which effectively disables perspective correction. If the test was modified such that the perspective correction bit 20 was set, then hardware result is quite different but 0.8.20 result remains the same. Also, if, instead, the test was modified such that w was somewhat greater than zero, then hardware result would look pretty much the same as before, but 0.8.20 result would look like what hardware does when bit 20 is set.

Hi, so nice to see you uploaded this PR! Just to clarify, #1895 focuses on handling the ZMinMaxControl tests, which this one does not fully, so I would say they are different although they show some overlaps. I'll cut my PR down and adapt it to this one once it's merged.
Btw, I came upon the same issue you did with MS dashboard, OGL and floating point precission. In case you'd prefer to handle it that way, I solved this by doing the [0, 1] to [-1, 1] conversion in fragment shader.

Also one request, I used a texture perspective variable you implemented in your dev branch to solve some texture warping in mine. Are you planning on uploading it in a PR at some point? Thank you

@abaire
Copy link
Contributor

abaire commented Mar 3, 2025

Anyway, the perspective correction bit 20 is always enabled by Direct3D and implementing the unset bit is irrelevant to game compatibility

Just to clarify, #1895 focuses on handling the ZMinMaxControl tests

Arguably we should just drop/disable the tests if bit 20 is always set by real titles. Scanning through my many pgraph dumps I've only seen it unset in nxdk-based programs. I don't think it's worth carrying extra maintenance complexity just to emulate behavior that was not used in practice.

@coldhex it would be good to upstream the bit 20 setting to the nxdk nv_regs, perhaps with a comment indicating that it's always set by DirectX?

Copy link
Member

@mborgerson mborgerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! Thanks for the tests and detailed description. Will you please also confirm that other tests in the pgraph suite are not negatively impacted by the changes? Left a couple of initial review comments.

@GXTX
Copy link
Contributor

GXTX commented Mar 5, 2025

Arguably we should just drop/disable the tests if bit 20 is always set by real titles. [...] I don't think it's worth carrying extra maintenance complexity just to emulate behavior that was not used in practice.

2c: Very little skin in the game but xemu aims to be a hardware emulator, thus it should support this.

If tPosition is a zero-vector, then invViewport matrix had no effect.
Bounding w-coordinate away from zero and infinity must be done before
applying invViewport (which is needed for OpenGL/Vulkan) to emulate
Xbox hardware behaviour properly.
@coldhex
Copy link
Author

coldhex commented Mar 5, 2025

@coldhex it would be good to upstream the bit 20 setting to the nxdk nv_regs, perhaps with a comment indicating that it's always set by DirectX?

Yes, I'll do that.

@coldhex
Copy link
Author

coldhex commented Mar 5, 2025

Also one request, I used a texture perspective variable you implemented in your dev branch to solve some texture warping in mine. Are you planning on uploading it in a PR at some point? Thank you

No current plans, so go ahead!

@coldhex
Copy link
Author

coldhex commented Mar 5, 2025

Wonderful! Thanks for the tests and detailed description. Will you please also confirm that other tests in the pgraph suite are not negatively impacted by the changes? Left a couple of initial review comments.

I have run all the nxdk_pgraph_tests and there weren't any negative changes apart from the unset perspective correction bit 20 discussed earlier. In addition to W param tests, it also affects some of the Front face tests (https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-Front_face) which set one vertex w=Infinity. These are fixed by bit 20 implementation. (Btw, some of the tests give assert failures (also in Xemu 0.8.20), and some are unstable in Xemu in the sense rerunning them gives different output. Especially Surface_pitch/Swizzle give a different output every time I press A-key.)

I just pushed a new commit to fix a case in fixed function vertex shader. If the input tPosition is an all-zero vector, then oPos = invViewport * (tPosition * compositeMat); gives just a zero-vector also, i.e. invViewport doesn't do anything. The clamping is supposed to be done before undoing viewport changes (to get back to OpenGL/Vulkan clip-space.) This gives hardware-matching results for the quirky all-zero vertices in ZMinMaxControl tests.

CtrlFixed_ZCLAMP test (top-left is HW, top-right is new commit, bottom-left is Xemu 0.8.20, bottom-right is PR without new commit):

CtrlFixed_ZCLAMP

(Bottom-right also has the other quad reaching the center of the screen, but it is backface-culled because center position also reverses winding.)

@abaire
Copy link
Contributor

abaire commented Mar 6, 2025

... (Btw, some of the tests give assert failures (also in Xemu 0.8.20), and some are unstable in Xemu in the sense rerunning them gives different output. Especially Surface_pitch/Swizzle give a different output every time I press A-key.)

Those failures are expected. A few tests are non-deterministic on HW as well, though not Surface_pitch/Swizzle. I think that's just one of a number of tests that are still bugged on xemu.

@Triticum0
Copy link
Collaborator

Fixed #1108 Lighting look like that on hardware even though PS2 and gamecube look more correct don't know what the dev did?
Source: https://youtu.be/4mO1i5nYyug?t=25

@Triticum0
Copy link
Collaborator

Fixes #1090
xemu-2025-03-06-02-59-53

@Triticum0
Copy link
Collaborator

Triticum0 commented Mar 6, 2025

Fixes Most of #1016 and #1273
007: Nightfire
Battlefield 2: Modern Combat
batman begins(Shadows)
Masters:
xemu-2025-03-06-03-11-08

PR:
xemu-2025-03-06-03-10-44

Breakdown
Catwoman
Far Cry Instincts
Ice Age 2
Masters:
xemu-2025-03-06-03-14-39

PR:
xemu-2025-03-06-02-57-33

I-Ninja
Masters:
xemu-2025-03-06-02-38-25

PR:
xemu-2025-03-06-02-38-43

Juiced
Marvel Nemesis: Rise of the Imperfects
Project Gotham Racing
Quantum Redshift
Robots
Masters:
xemu-2025-03-06-02-46-37

PR:
xemu-2025-03-06-02-46-48

V-Rally

Here screenshot I took before look the same so did bother taking more screenshots but retest to make sure the issues are fixed #1826 (comment)

Untested games Which should be fixed.
Knight's Apprentice: Memorick's Adventures AMD Only
Splinter Cell Pandora Tomorrow
Tom Clancy's Rainbow Six 3 Black Arrow
Tom Clancy's Rainbow Six 3
Tom Clancy's Ghost Recon: Advanced Warfighter

@Triticum0
Copy link
Collaborator

Improves #1143 still doesn't match hardware
Masters:
xemu-2025-03-06-04-09-28

PR:
xemu-2025-03-06-04-09-07

Console:
https://youtu.be/Xu5BH4j0K1A?t=120

@abaire
Copy link
Contributor

abaire commented Mar 7, 2025

FYI, I threw together a repo to generate diffs for PRs vs previous xemu versions. Results for this PR are here: https://github.com/abaire/xemu-dev_pgraph_test_results/wiki/pr_nv2a-wbuffering-fix (alpha func diffs should be ignored, the diff ran against 0.8.25 which has an additional fix)

Xbox rounds -0.0 to the negative range and 0.0 to the positive range. This
commit also restores RCC instruction clamping to be done on the output of
reciprocal calculation (which current Xemu release does) with fix for the
input=Infinity case.
@coldhex
Copy link
Author

coldhex commented Mar 7, 2025

Following @mborgerson initial review comments, I fixed the -0.0 case when clamping away from zero and infinity.

I restored RCC instruction clamping to how it is in Xemu release, except for the more precise bounding constants, and the new fix for the case when input is Infinity. I've added and updated W param tests here: coldhex/nxdk_pgraph_tests@89e2b67

In the following images, top-left is Xbox hardware, top-right is PR with new commit, bottom-left is Xemu 0.8.20 and bottom-right is PR without new commit.

New commit fixes RCC instruction when input=Infinity. This test uses a vertex program that divides x and y by the z shown in the image.
rcc_w_zero_inf__zinf
Note that with 0.8.20 the bottom-right vertex coordinates flip to negative, because 1.0/Infinity = 0.0 and this does not match the t > 0.0 condition in 0.8.20.

The same clamping behaviour regarding 0.0 and -0.0 must be used for vertex shader W-coordinate output:
prog_w_zero_inf__bitri_w-inf
(Because w is negative (i.e. behind camera), the screen coordinates for top-left and bottom-right must be swapped in the test.) Note how without new commit w=-0.0 is treated as if it is 0.0 (and therefore is clamping the positive range.)

I also found some more precision issues regarding large coordinates (in addition to the texture twitching in previous comments.) Here is a progression of top-left coordinate towards -0.0 using Radeon and Vulkan:
prog_w_zero_inf__bitri_w-4 00
prog_w_zero_inf__bitri_w-2 00
prog_w_zero_inf__bitri_w-1 00
prog_w_zero_inf__bitri_w-0 50
prog_w_zero_inf__bitri_w-0 25

This sequence converges and eventually starts looking like this:

prog_w_zero_inf__bitri_w-3 08e-33
prog_w_zero_inf__bitri_w-0 96e-34
prog_w_zero_inf__bitri_w-1 50e-36

This is where my Radeon's (with Mesa and Vulkan) numerical precision starts affecting the result (before reaching the clamping range limits):

prog_w_zero_inf__bitri_w-7 52e-37
prog_w_zero_inf__bitri_w-3 76e-37
prog_w_zero_inf__bitri_w-1 88e-37
prog_w_zero_inf__bitri_w-0 00

The point of breaking above depends on hardware. My Intel integrated (UHD 770) breaks sooner. I tried these tests with my experimental/interpolation branch (with the -0.0 fix) with custom clipping enabled and it seems to fix the problem. So the breaking here is due to limited accuracy of 32-bit floats (and perhaps less than optimal numerical stability of algorithms used by my GPUs). The test above is certainly contrived in the same sense that the second texture twitching case with large texture coordinates is. Problems resulting from huge coordinate differences is something that developers generally try tp avoid because of exactly these kinds of precision issues. However, I just mention this because Xbox hardware doesn't seem to have such precision issues here (and possibly the magic bounding constants 2^(-64) and 2^64 were chosen by Nvidia so that their algorithms work.)

"\n"
"float clampAwayZeroInf(float t) {\n"
" if (t > 0.0 || floatBitsToUint(t) == 0) {\n"
" t = clamp(t, 5.421011e-20, 1.8446744e19);\n"
Copy link
Member

@mborgerson mborgerson Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use uintBitsToFloat to supply the specific encoding of these values (0x1F800000, 0x5F800000) to avoid a chance the glsl compiler picks a different representation. Since we are trying to encode these exact values anyway, it makes sense. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. When using uintBitsToFloat it compiles to the exact same code on my system, but source code is cleaner that way.

@mborgerson
Copy link
Member

FYI, I threw together a repo to generate diffs for PRs vs previous xemu versions. Results for this PR are here: https://github.com/abaire/xemu-dev_pgraph_test_results/wiki/pr_nv2a-wbuffering-fix (alpha func diffs should be ignored, the diff ran against 0.8.25 which has an additional fix)

Thanks @abaire! I really like having all of these tests results, but the volume of results is a bit difficult to navigate and review in this format. Is there a table of sorts where I can see which tests were impacted by the change and how it compares to xemu master and xbox hardware?

@mborgerson
Copy link
Member

@coldhex

New commit fixes RCC instruction when input=Infinity. This test uses a vertex program that divides x and y by the z shown in the image.

Thanks for the additional tests and PR improvements. I'm having a hard time seeing a difference between the TR and BR quadrants here, so I'm not exactly sure what the result is trying to demonstrate in PR improvement in the first image.

// " gl_ClipDistance[0] = gl_in[index].gl_ClipDistance[0];\n"
// " gl_ClipDistance[1] = gl_in[index].gl_ClipDistance[1];\n"
" vtx_inv_w = v_vtx_inv_w[index];\n"
" vtx_inv_w_flat = v_vtx_inv_w[provoking_index];\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested to see how these changes impact flat shading. Looks like this was added in #1180 by @abaire. @abaire: Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the results comparison, it seems it's no worse than it already was. There are still differences against HW, but it looks like they haven't changed since 0.7.98 which is the oldest version I happen to have generated goldens for. I double checked the raw results manually to rule out a bug in the regression comparison script.

Maybe @Triticum0 could quickly check Blinx shadows to double check that there's no regression?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed vtx_inv_w_flat because, in case of flat shading, it together with D0, D1, B0, B1 are all non-interpolated vertex attributes and multiplying in vertex shader and then dividing in fragment shader by the exact same value is superfluous. So, this is just simplification of #1180 and it makes sense to remove vtx_inv_w_flat together at the same time when vtx_inv_w is removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test blinks and couldn't see any regression with textures being missing

@abaire
Copy link
Contributor

abaire commented Mar 8, 2025

FYI, I threw together a repo to generate diffs for PRs vs previous xemu versions. Results for this PR are here: https://github.com/abaire/xemu-dev_pgraph_test_results/wiki/pr_nv2a-wbuffering-fix (alpha func diffs should be ignored, the diff ran against 0.8.25 which has an additional fix)

Thanks @abaire! I really like having all of these tests results, but the volume of results is a bit difficult to navigate and review in this format. Is there a table of sorts where I can see which tests were impacted by the change and how it compares to xemu master and xbox hardware?

Unfortunately it's just the TOC in the right panel. I'm planning to generate Pages output like https://abaire.github.io/xemu-nxdk_pgraph_tests_results/compare/xemu-0.8.25-master-046d127f31bad089decc2e213dc8488c43aa6f04/Linux_x86_64/gl_NVIDIA_Corporation_NVIDIA_GeForce_GTX_1070-PCIe-SSE2%3Agslv_4.00_NVIDIA_via_Cg_compiler/index.html it was just faster to throw it in the wiki as a proof of concept.

@MasonT8198
Copy link
Contributor

This fixes the shadow flickering and skybox flickering in The Incredible Hulk: Ultimate Destruction, pushing the game to near perfect status, all that's left is to fix the 0.5 issue with the window scaling (ie the lines in the left and top sides)

@coldhex
Copy link
Author

coldhex commented Mar 8, 2025

@coldhex

New commit fixes RCC instruction when input=Infinity. This test uses a vertex program that divides x and y by the z shown in the image.

Thanks for the additional tests and PR improvements. I'm having a hard time seeing a difference between the TR and BR quadrants here, so I'm not exactly sure what the result is trying to demonstrate in PR improvement in the first image.

TR and BR are the same since Infinity as input to RCC wasn't a problem in the PR. That image demonstrated that the new commit still fixes Xemu 0.8.20 RCC instruction. The case that is fixed by the new commit is -0.0 as input to RCC. I didn't include the test result in that comment, but here it is:

rcc_w_zero_inf__z-inf

Bottom-right quadrant is the initial commit behaviour, which didn't take into account the minus in -0.0 so the top-left vertex flips to negative.

Xcedf pushed a commit to Xcedf/xemu that referenced this pull request Mar 8, 2025
@mborgerson
Copy link
Member

Great, thanks!

@Triticum0
Copy link
Collaborator

I Think this pr break Shenmue 2 Sky box #2049 may want to look into when you have the time. @coldhex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Midtown Madness 3: Flickering shadows Halo 2: Able to see other objects through objects close to you

8 participants