-
-
Notifications
You must be signed in to change notification settings - Fork 347
nv2a: Perspective-correct interpolation for w-buffering #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nv2a: Perspective-correct interpolation for w-buffering #1937
Conversation
z_perspective is true implies w-buffering and then the w-coordinate stored in the depth buffer should also be interpolated in a perspective-correct way. We do this by calculating w and setting gl_FragDepth in the fragment shader. Since enabling polygon offset and setting values using glPolygonOffset won't have any effect when manually setting gl_FragDepth for w-buffering, we introduce the depthOffset variable to obtain similar behaviour (but the glPolygonOffset factor-argument is currently not emulated.) (Note that glPolygonOffset is OpenGL implementation-dependent and it might be good to use depthOffset for z-buffering as well, but this is not done here and we still use OpenGL/Vulkan zbias functionality.) This also implements depth clipping and clamping in the fragment shader. If triangles are clipped, the shadows of the small rocks in Halo 2 Beaver Creek map can have flickering horizontal lines. The shadows are drawn on the ground in another pass with the same models as for the ground, but for some reason with depth clamping enabled. The flickering happens if Xemu clips the ground triangles, but the exact same shadow triangles are depth clamped, so there are small differences in the coordinates. The shadows are drawn with depth function GL_EQUAL so there is no tolerance for any differences. Clipping in the fragment shader solves the problem because the ground and shadow triangles remain exactly the same regardless of depth clipping/clamping. For some performance gain, it might be a good idea to cull triangles by depth in the geometry shader, but this is not implemented here. In the programmable vertex shader we always multiply position output by w because this improves numerical stability in subsequent floating point computations by modern GPUs. This usually means that the perspective divide done by the vertex program gets undone. The magic bounding constants 5.42101e-020 and 1.884467e+019 are replaced by 5.421011e-20 and 1.8446744e19, i.e. more decimals added. This makes the 32-bit floating point numbers represent exactly 2^(-64) and 2^64 (raw bits 0x1f800000 and 0x5f800000) which seem more likely the correct values although testing with hardware was not done to this precision. Testing indicates that the same RCC instruction magic constants are also applied to both fixed function and programmable vertex shader w-coordinate output. This bounding replaces the special test for w==0.0 and abs(w)==inf which used to set vtx_inv_w=1.0 (which did not match Xbox hardware behaviour.)
Does this PR superseeds #1895 ? |
Point 2: depth clipping and clamping These tests draw to two poles at fixed locations and a checkboard floor. Depth clipping and clamping is applied to the floor. Legend:
As mentioned in the description, Xemu tests were done with Linux, Mesa and Vulkan. Mesa with OpenGL has the strange feature that glPolygonOffset applies twice the expected offset, but that is allowed since glPolygonOffset is implementation dependent (as mentioned also in Xemu source code.) Vulkan Mesa behaves well, so the poles in the following tests are in the correct place even with z-buffering. |
Point 2: depth clipping and clamping and why geometric depth clipping doesn't work well The following tests first draw a checkboard floor and then draw another red floor with depth buffer equals test (i.e. NV097_SET_DEPTH_FUNC_V_EQUAL). It applies slight offset to z-coordinates, and also has tests for both z-buffering and w-buffering. This simulates what happens in Halo 2 and some shadows. I'll also post one case with OpenGL since the checkerboard peeking from under the red looks more interesting. |
Point 3: vertex shader w output coordinate bounding away from zero and infinity These tests are from: coldhex/nxdk_pgraph_tests@237fa30 Again, tests are for both fixed function and programmable vertex shaders and draws either quad or a square using two triangles. The top left vertex w-coordinate value and bottom right vertex coordinate value are printed. This demonstrates that Xbox clamps the output w-coordinates from both fixed function and programmable vertex shaders to the same range as is used by the RCC instruction, i.e. away from zero and away from infinity. The programmable vertex shader in these tests do not use that instruction (and just use the passthrough vertex shader from nxdk_pgraph_tests.) Note how Xemu 0.8.20 continues where Xbox and PR stop due to w-clamping. quad: next quad case: The following quad case shows how this check in Xemu programmable vertex shader does not match hardware: xemu/hw/xbox/nv2a/pgraph/glsl/vsh-prog.c Lines 832 to 838 in 3bdb9e7
The top-right and bottom-left vertices in the quad have w=1.0 and then 0.8.20 also effectively sets w=1.0 to top-left and bottom-right vertices. This doesn't have anything do with quads vs. tris, it's just that I put w=1.0 to top-right and bottom-left for quads in my test case. Similar problem could be observed with tris as well. The rest of the programmable quad cases look the same, so I'll just post one: |
Tested: this also seems to fix some flickering skybox issues in other games, good work! |
Point 4: programmable vertex shader output multiplication by w (i.e. undoing the perspective divide.) First I'll show the problem in Xemu 0.8.20 and Halo 2, with both Vulkan and OpenGL. This is not due to w-buffering and w-coordinate interpolation problems. halo2_xemu0820vulkan.mp4The same with 0.8.20 and OpenGL (loaded from snapshot.) halo2_xemu0820opengl.mp4I've reproduced this in these test cases: https://github.com/coldhex/nxdk_pgraph_tests/blob/237fa303092fc210b45def8529805ac5355849e3/src/tests/clipping_precision_tests.cpp There are mainly two types of test cases, the first is relevant to games, and the second is somewhat contrived. The problem is due to large vertex coordinates and large texture coordinates and how modern GPUs apparently don't apply all the numerical stability tricks that they could to avoid it (to the extent that they would perform as well as Xbox hardware.) The first type was produced based on coordinates similar to observed Halo 2 problems and therefore is relevant. Vertex coordinates for one vertex are large and its w-coordinate is small and varied slightly. The second case has almost the same w-coordinates for all vertices, but texture coordinates are large. Here's Xemu 0.8.20 with Vulkan: clipping_precision_xemu0820vulkan.mp4Here's Xbox hardware showing no trouble: clipping_precision_xboxhw.mp4This PR only aims to solve the first type by multiplying by w coordinate and letting PC GPU hardware do the perspective-correct interpolation of vertex attributes. Since I don't know what algorithms modern GPUs actually use it is hard to say why this works and how exactly it avoids numerical problems. The second type is not solved, because the w-coordinates in that one are about 1.0 anyway so multiplying by w doesn't really change anything. PR: clipping_precision_prvulkan.mp4PR (Vulkan) and Halo 2 (loaded from snapshot): halo2_prvulkan.mp4The clipping precision tests in my dev nxdk_pgraph_tests branch also contain test cases where the two triangles are rotated to see if any of this depends on orientation (it doesn't). It also contains test cases that cycle the vertex coordinates (i.e. if originally vertex submission order is 0->1->2, then the first cycled test case is 1->2->0.) It is interesting that the first type of twitching is solved by cycling the coordinates appropriately. However, it is difficult to say how to do this consistently in a way that would solve it for all GPUs (or even just mine) since the algorithms and sequence of calculations GPUs do for clipping and attribute interpolation are not publicly disclosed. One obvious fix would be to just use double-precision floating point numbers in interpolation, but that's not supported by GLSL (or GPUs I suppose.) It is possible to use double-precision floating point by interpolating manually. I have a branch experimenting with vertex attribute interpolation with e.g. barycentric coordinates without w-multiplication, but it doesn't beat the simplicity (or performance) of just simply multiplying by w: https://github.com/coldhex/xemu/tree/experimental/interpolation |
I haven't looked at the actual commit details, but this is super impressive work. It'd be great to upstream your test branch - I've recently been throwing together some automation so we can better track xemu regressions and you've added significant coverage. I'd be happy to help update your tests to accommodate some of the infrastructure changes I've made since you forked (hopefully not much work, it might be just the registration in main.cpp that's relevant). |
That's for maintainers to decide, but I prefer my code. This is perhaps more like a code review and I have not tested any of that code, but about #1895:
In addition to the near/far plane culling feature, that PR also contains an implementation of the perspective-uncorrected vertex attribute interpolation feature of Xbox hardware. Both are independent hardware features and I think each should be in separate PRs. (My PR also contains more things than I would like to, but they are interconnected. For example, keeping z-buffering clipping like it is in 0.8.20 would require adding branching in gl/draw.c and vk/draw.c to enable depth clamping only when w-buffering, but then later on that branching would have to be removed anyway if the clipping precision issues demonstrated in comment #1937 (comment) were to be fixed.) In particular, the perspective-corrected interpolation flag (bit 20 of NV097_SET_CONTROL0) is something I discovered when experimenting with hardware using nxdk_pgraph_tests and my textures were getting stretched by Xbox HW as if no perspective correction was applied. I eventually found the curious bit 20 from Xemu trace events. I don't know if it was known about before and it doesn't really matter. (The separate bundle bit location I placed at the same location I found online in the awesome "nVidia hardware documentation" by Marcelina Kościelnicka, although technically Xemu could decide to store it anywhere.) Anyway, the perspective correction bit 20 is always enabled by Direct3D and implementing the unset bit is irrelevant to game compatibility. I did my implementation originally in the halo2fixes branch coldhex@c513d9b, but I consider the feature esoteric. The perspective correction bit 20 is relevant only in nxdk_pgraph_tests. The ff_w_zero_* tests in Golden test results: https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-W_param. The test doesn't set the bit. Xemu master/0.8.20 does two wrong things to make that test come out right. The stretching you see there is actually due to the usual texture stretching that happens when perspective correction is not applied. It's not because the coordinate w is zero. The two wrong things in 0.8.20 are: 1. the perspective-uncorrected feature is not implemented, 2. vertex shader has a special case check for w==0.0, which then sets vtx_inv_w=1.0 which effectively disables perspective correction. If the test was modified such that the perspective correction bit 20 was set, then hardware result is quite different but 0.8.20 result remains the same. Also, if, instead, the test was modified such that w was somewhat greater than zero, then hardware result would look pretty much the same as before, but 0.8.20 result would look like what hardware does when bit 20 is set. |
Thanks! Sure, I'll try and see if I can bring my fork up-to-date and then submit some PRs. |
Hi, so nice to see you uploaded this PR! Just to clarify, #1895 focuses on handling the ZMinMaxControl tests, which this one does not fully, so I would say they are different although they show some overlaps. I'll cut my PR down and adapt it to this one once it's merged. Also one request, I used a texture perspective variable you implemented in your dev branch to solve some texture warping in mine. Are you planning on uploading it in a PR at some point? Thank you |
Arguably we should just drop/disable the tests if bit 20 is always set by real titles. Scanning through my many pgraph dumps I've only seen it unset in nxdk-based programs. I don't think it's worth carrying extra maintenance complexity just to emulate behavior that was not used in practice. @coldhex it would be good to upstream the bit 20 setting to the nxdk nv_regs, perhaps with a comment indicating that it's always set by DirectX? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! Thanks for the tests and detailed description. Will you please also confirm that other tests in the pgraph suite are not negatively impacted by the changes? Left a couple of initial review comments.
2c: Very little skin in the game but xemu aims to be a hardware emulator, thus it should support this. |
If tPosition is a zero-vector, then invViewport matrix had no effect. Bounding w-coordinate away from zero and infinity must be done before applying invViewport (which is needed for OpenGL/Vulkan) to emulate Xbox hardware behaviour properly.
Yes, I'll do that. |
No current plans, so go ahead! |
I have run all the nxdk_pgraph_tests and there weren't any negative changes apart from the unset perspective correction bit 20 discussed earlier. In addition to W param tests, it also affects some of the Front face tests (https://github.com/abaire/nxdk_pgraph_tests_golden_results/wiki/Results-Front_face) which set one vertex w=Infinity. These are fixed by bit 20 implementation. (Btw, some of the tests give assert failures (also in Xemu 0.8.20), and some are unstable in Xemu in the sense rerunning them gives different output. Especially Surface_pitch/Swizzle give a different output every time I press A-key.) I just pushed a new commit to fix a case in fixed function vertex shader. If the input tPosition is an all-zero vector, then CtrlFixed_ZCLAMP test (top-left is HW, top-right is new commit, bottom-left is Xemu 0.8.20, bottom-right is PR without new commit): (Bottom-right also has the other quad reaching the center of the screen, but it is backface-culled because center position also reverses winding.) |
Those failures are expected. A few tests are non-deterministic on HW as well, though not |
Fixed #1108 Lighting look like that on hardware even though PS2 and gamecube look more correct don't know what the dev did? |
Fixes #1090 |
Fixes Most of #1016 and #1273 Breakdown Juiced V-Rally Here screenshot I took before look the same so did bother taking more screenshots but retest to make sure the issues are fixed #1826 (comment) Untested games Which should be fixed. |
Improves #1143 still doesn't match hardware Console: |
FYI, I threw together a repo to generate diffs for PRs vs previous xemu versions. Results for this PR are here: https://github.com/abaire/xemu-dev_pgraph_test_results/wiki/pr_nv2a-wbuffering-fix (alpha func diffs should be ignored, the diff ran against 0.8.25 which has an additional fix) |
Xbox rounds -0.0 to the negative range and 0.0 to the positive range. This commit also restores RCC instruction clamping to be done on the output of reciprocal calculation (which current Xemu release does) with fix for the input=Infinity case.
Following @mborgerson initial review comments, I fixed the -0.0 case when clamping away from zero and infinity. I restored RCC instruction clamping to how it is in Xemu release, except for the more precise bounding constants, and the new fix for the case when input is Infinity. I've added and updated W param tests here: coldhex/nxdk_pgraph_tests@89e2b67 In the following images, top-left is Xbox hardware, top-right is PR with new commit, bottom-left is Xemu 0.8.20 and bottom-right is PR without new commit. New commit fixes RCC instruction when input=Infinity. This test uses a vertex program that divides x and y by the z shown in the image. The same clamping behaviour regarding 0.0 and -0.0 must be used for vertex shader W-coordinate output: I also found some more precision issues regarding large coordinates (in addition to the texture twitching in previous comments.) Here is a progression of top-left coordinate towards -0.0 using Radeon and Vulkan: This sequence converges and eventually starts looking like this: This is where my Radeon's (with Mesa and Vulkan) numerical precision starts affecting the result (before reaching the clamping range limits): The point of breaking above depends on hardware. My Intel integrated (UHD 770) breaks sooner. I tried these tests with my experimental/interpolation branch (with the -0.0 fix) with custom clipping enabled and it seems to fix the problem. So the breaking here is due to limited accuracy of 32-bit floats (and perhaps less than optimal numerical stability of algorithms used by my GPUs). The test above is certainly contrived in the same sense that the second texture twitching case with large texture coordinates is. Problems resulting from huge coordinate differences is something that developers generally try tp avoid because of exactly these kinds of precision issues. However, I just mention this because Xbox hardware doesn't seem to have such precision issues here (and possibly the magic bounding constants 2^(-64) and 2^64 were chosen by Nvidia so that their algorithms work.) |
hw/xbox/nv2a/pgraph/glsl/vsh.c
Outdated
"\n" | ||
"float clampAwayZeroInf(float t) {\n" | ||
" if (t > 0.0 || floatBitsToUint(t) == 0) {\n" | ||
" t = clamp(t, 5.421011e-20, 1.8446744e19);\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use uintBitsToFloat
to supply the specific encoding of these values (0x1F800000, 0x5F800000) to avoid a chance the glsl compiler picks a different representation. Since we are trying to encode these exact values anyway, it makes sense. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense. When using uintBitsToFloat it compiles to the exact same code on my system, but source code is cleaner that way.
Thanks @abaire! I really like having all of these tests results, but the volume of results is a bit difficult to navigate and review in this format. Is there a table of sorts where I can see which tests were impacted by the change and how it compares to xemu master and xbox hardware? |
Thanks for the additional tests and PR improvements. I'm having a hard time seeing a difference between the TR and BR quadrants here, so I'm not exactly sure what the result is trying to demonstrate in PR improvement in the first image. |
// " gl_ClipDistance[0] = gl_in[index].gl_ClipDistance[0];\n" | ||
// " gl_ClipDistance[1] = gl_in[index].gl_ClipDistance[1];\n" | ||
" vtx_inv_w = v_vtx_inv_w[index];\n" | ||
" vtx_inv_w_flat = v_vtx_inv_w[provoking_index];\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the results comparison, it seems it's no worse than it already was. There are still differences against HW, but it looks like they haven't changed since 0.7.98 which is the oldest version I happen to have generated goldens for. I double checked the raw results manually to rule out a bug in the regression comparison script.
Maybe @Triticum0 could quickly check Blinx shadows to double check that there's no regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed vtx_inv_w_flat because, in case of flat shading, it together with D0, D1, B0, B1 are all non-interpolated vertex attributes and multiplying in vertex shader and then dividing in fragment shader by the exact same value is superfluous. So, this is just simplification of #1180 and it makes sense to remove vtx_inv_w_flat together at the same time when vtx_inv_w is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test blinks and couldn't see any regression with textures being missing
Unfortunately it's just the TOC in the right panel. I'm planning to generate Pages output like https://abaire.github.io/xemu-nxdk_pgraph_tests_results/compare/xemu-0.8.25-master-046d127f31bad089decc2e213dc8488c43aa6f04/Linux_x86_64/gl_NVIDIA_Corporation_NVIDIA_GeForce_GTX_1070-PCIe-SSE2%3Agslv_4.00_NVIDIA_via_Cg_compiler/index.html it was just faster to throw it in the wiki as a proof of concept. |
This fixes the shadow flickering and skybox flickering in The Incredible Hulk: Ultimate Destruction, pushing the game to near perfect status, all that's left is to fix the 0.5 issue with the window scaling (ie the lines in the left and top sides) |
TR and BR are the same since Infinity as input to RCC wasn't a problem in the PR. That image demonstrated that the new commit still fixes Xemu 0.8.20 RCC instruction. The case that is fixed by the new commit is -0.0 as input to RCC. I didn't include the test result in that comment, but here it is: Bottom-right quadrant is the initial commit behaviour, which didn't take into account the minus in -0.0 so the top-left vertex flips to negative. |
Great, thanks! |
There are/were existing PRs on this same subject, but I wish to present my solution here as well.
z_perspective is true implies w-buffering and then the w-coordinate stored in the depth buffer should also be interpolated in a perspective-correct way. We do this by calculating w and setting gl_FragDepth in the fragment shader.
Since enabling polygon offset and setting values using glPolygonOffset won't have any effect when manually setting gl_FragDepth for w-buffering, we introduce the depthOffset variable to obtain similar behaviour (but the glPolygonOffset factor-argument is currently not emulated.) (Note that glPolygonOffset is OpenGL implementation-dependent and it might be good to use depthOffset for z-buffering as well, but this is not done here and we still use OpenGL/Vulkan zbias functionality.)
This also implements depth clipping and clamping in the fragment shader. If triangles are clipped, the shadows of the small rocks in e.g. Halo 2 Beaver Creek map can have flickering horizontal lines. The shadows are drawn on the ground in another pass with the same models as for the ground, but for some reason with depth clamping enabled. The flickering happens if Xemu clips the ground triangles, but the exact same shadow triangles are depth clamped, so there are small differences in the coordinates. The shadows are drawn with depth function GL_EQUAL so there is no tolerance for any differences. Clipping in the fragment shader solves the problem because the ground and shadow triangles remain exactly the same regardless of depth clipping/clamping. For some performance gain, it might be a good idea to cull triangles by depth in the geometry shader, but this is not implemented here.
In the programmable vertex shader we always multiply position output by w because this improves numerical stability in subsequent floating point computations by modern GPUs. This usually means that the perspective divide done by the vertex program gets undone.
The magic bounding constants 5.42101e-020 and 1.884467e+019 are replaced by 5.421011e-20 and 1.8446744e19, i.e. more decimals added. This makes the 32-bit floating point numbers represent exactly 2^(-64) and 2^64 (raw bits 0x1f800000 and 0x5f800000) which seem more likely the correct values although testing with hardware was not done to this precision.
Testing indicates that the same RCC instruction magic constants are also applied to both fixed function and programmable vertex shader w-coordinate output. This bounding replaces the special test for w==0.0 and abs(w)==inf which used to set vtx_inv_w=1.0 (which did not match Xbox hardware behaviour.)
Fixes #55, fixes #1261
The main thing about this PR is perspective-correct interpolation for w-coordinates when w-buffering. Currently Xemu (0.8.20) interpolates w-coordinates linearly in screen space resulting in rendering artifacts. However, fixing w-values also requires fixing other things such as depth clipping (since it is done based on w-values) and I don't like to leave things too broken even if one big thing gets fixed, so I'm including related fixes here as well.
In summary, the changes are:
I want to justify each of these changes so I'll next post test results from my nxdk_pgraph_tests dev branch (https://github.com/coldhex/nxdk_pgraph_tests/tree/dev) comparing Xbox hardware, Xemu 0.8.20 and this PR. Obviously, most of the test cases started off by copying one of @abaire's existing tests (Thanks!). To keep this description from getting too big, I'll include only results for point 1 here and then post rest in subsequent comments (and continue tomorrow if I don't have time them to post them all today.)
In tests my PC hardware is AMD Radeon RX 6600, Linux, Mesa and Vulkan unless otherwise stated. Xbox HW is v1.4.
Point 1:
W-buffering tests (coldhex/nxdk_pgraph_tests@8551623). These draw a checkerboard floor and print w-buffer values at the indicated pixels. Both fixed function and programmable vertex shader, and depth offset or not (which corresponds to glPolygonOffset() second parameter, i.e. z-bias value).
Fixed function, no depth offset:

Fixed function, with depth offset:

Programmable, no depth offset:

Programmable, with depth offset:

Note the big disagreement between HW and 0.8.20, especially with fixed function where the Xemu depth buffer values are typical for z-buffering (because 0.8.20 doesn't implement w-buffering at all in fixed function vertex shader.) The slight disagreement between HW and PR might be worrying, but this is actually due to Xbox screen coordinate rounding to 4-bit fixed point precision (for rasterization?), which is currently unimplemented in Xemu. There are existing PRs about it and I have my take on it in my dev branch (for 1x scaling.) Anyway, with the vertex rounding commit, depth buffer agreement is quite good: only four horizontal lines in the whole frame have a 1-off error. Here are the depth buffers:
Xbox HW (programmable, no depth offset):

PR (programmable, no depth offset):

Xbox vertex rounding is a subject for some other PR. The above test result from my dev branch is here just to point out that the slight error in depth buffer values in this PR is not an inherent flaw.