Skip to content

Conversation

@MattPonton
Copy link

@MattPonton MattPonton commented Jul 19, 2025

This handles bitmaps better when not drawing. This will improve performance in some known areas such as Dead or Alive 2 Ultimate's Burai Zenin stage and other areas with rain effects such as Miyama stage. In my personal testing it's increased performance in the affected areas by 50%, making each of the affected stages able to run at 60 FPS even at multiple internal render options. In the Video Debug this is most noticeable in the FINISH_VERTEX_BUFFER_DIRTY value.

Before:
image

After:
image

@MattPonton MattPonton changed the title Update vertex ram buffer to handle bitmap calls when not drawing. nv2a/vk: Update vertex ram buffer to handle bitmap calls when not drawing. Jul 19, 2025
// Vertex data changed while building the draw list. Finish drawing
// before updating RAM buffer.
pgraph_vk_finish(pg, VK_FINISH_REASON_VERTEX_BUFFER_DIRTY);
} else if (overlaps_uploaded && !r->in_draw) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what the call stack looks like for cases where the buffer is being flushed outside of a draw?

I don't know the VK code particularly well, but I worry that this is being done to allow GPU memory to be synced back to guest-CPU accessible memory. If so, it could mean that skipping the sync works fine in many cases but fails in cases where a title manipulates the framebuffer via CPU during drawing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know in my testing it seemed that there was little to no improvement in the fix should the else branch not do anything. The clearing of the bitmap though was more a safety net for me just in case. However, you bring up a good point and I can try checking if memory is any better/worse with the else branch. Having just the if branch with the in_draw qualifier could work on its own otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is not necessarily the bitmap clear, it's whether it's safe to skip the finish at all.

If the finish is being done outside of the context of a draw in order to get CPU memory to be consistent with the GPU state it wouldn't be safe to skip the finish. You'd need to look at the call stack when in_draw is false to know why it's being called outside of a draw and then determine whether it's safe to skip or not. The original comment implies that it may be flushing the VK draw in order to read back into guest-CPU-accessible memory, which could make this a "do the wrong thing faster" regression.

For posterity/anybody following along: the Xbox CPU and GPU have unified memory access so it's able to interleave GPU draws with CPU manipulation/readback of the framebuffer. xemu needs to do extra work to sync CPU-modified surfaces to the host GPU and vice-versa in order to emulate this, so it sometimes does flushing that would not be typical for a modern app.

Copy link
Author

@MattPonton MattPonton Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, which is sadly limited as I can't test every XBOX game instance, I coded in an assert in the else to trigger and have yet to ever get it to throw. Not saying it means it will never happen, but just has never happened for me in the games I've tested. I could potentially just be very lucky in the race condition as well, if said race is actually happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm confused. If the branch that you added is never hit, what does this PR do? The original change would have either performed the original flush or gone down your new else path, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it so it only flushes the vertex if the renderer is in_draw. The original if else of my PR just cleared the bitmap as a safety check for when we weren't in_draw, but as I said I never could find a time where this path was ever triggered. So, per your comments I removed the if else and the only change is to only process the buffer when in_draw. The current master version of XEMU without this check causes performance drops due to calling the flush whether we are in_draw or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but if the code never hits the non in_draw path, then this change can't have any effect, right?

I think maybe when you added your assert you had some additional guarding it that never got hit. As I mentioned, my first concern is whether it's safe to skip the flush in all conditions, the bitmap clear is a secondary concern (though that also seems potentially unsafe if partial overlaps are possible).

The path I'd like you to check is the case where pgraph_vk_update_vertex_ram_buffer is called with a bit set but without in_draw. So instead of clearing the bitmap in your original "else" case, set a breakpoint or otherwise print the call stack there so you can try to verify that the flush really is safe to ignore.

Since this PR has an obvious effect on a game for you, either the else branch will be taken in that game or you can probably conclude that the change itself isn't doing anything and there's something different about how you build it versus the automated builds that produces the performance improvement. My guess is that the else branch will be taken and the assert you added previously had a logic bug that prevented it from triggering.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a few moments after making the comment I realized what it was you were suggesting/asking. I retested, correcting the logic of my assert, and was immediately met with instances where find_next_bit() returned true while not in a draw state (r->in_draw).

pgraph_vk_finish() as I understand it is called to insure that any in-flight draw operations using the old vertex buffer data are completed before the buffer is updated. If r->in_draw is false, no draw is in progress so there are no in-flight operations that could be using staled data. If we call pgraph_vk_finish() when r->in_draw is false it means that no draw commands are currently referencing the buffer, and updating the buffer is safe as the GPU is not using the data at the moment. The find_next_bit() could potentially be set due to previous operations, but unless a draw is in progress I believe there's no risk of a race or data hazard.

@MattPonton MattPonton changed the title nv2a/vk: Update vertex ram buffer to handle bitmap calls when not drawing. nv2a/vk: Update vertex ram buffer to finish only when drawing. Jul 24, 2025

if (find_next_bit(r->uploaded_bitmap, start_bit + nbits, start_bit) <
end_bit) {
if (find_next_bit(r->uploaded_bitmap, start_bit + nbits, start_bit) < end_bit && r->in_draw) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this function is called, r->in_draw should be false, which means this path will never be triggered

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case then where we even would need this if check and its proceeding functionality? I ask since I haven't found a situation where this path never being triggered has caused greater issues than when it has been triggered. But I also haven't tested the entire xbox library with this change yet. I admit it's possible that my change here could just be exposing a symptom to a deeper problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where we even would need this if check and its proceeding functionality

Yes, as the comment below indicates, if vertex data are changed while the draw command buffer is being built (but before it is executed), then we need to finish any pending drawing. There are ways to make this faster, but this simple logic here prioritizes correctness. There's also a chance there's some pathological case, or we are being too conservative--since you've found an instance where this path is triggered often you can find what data are changing (via the bitmap).

@mborgerson
Copy link
Member

Thanks for the PR! Because the proposed change impacts correctness (per comments above), I'll close this for now. I'd be glad to see more work exploring performance improvements here.

@mborgerson mborgerson closed this Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants