Skip to content

Conversation

@Ansraer
Copy link
Contributor

@Ansraer Ansraer commented Mar 6, 2023

Similar to what is already in master this PR uses KHR_debug to add labels to the OpenGL instructions (useful in external profilers) and ARB_timer_query to add more information about rendering timings to Godot's built-in profiler.

In addition, EXT_disjoint_timer_query is used as a fallback for android phones that don't support ARB_timer_query. It should be noted that this fallback is not 100% perfect, from time to time we get disjoint timestamps, which result in nonsensical timings.
Right now I am removing everything that is more than 200ms long, which does a very good job at filtering out any wrong data.

Screenshots:

Here is a screenshot of what a renderdoc capture looks like with the labels added by this PR:

Renderdoc Screenshot

image

As you can see the Events in the Event Browser & Timeline are now sorted into nicely labeled groups which makes it significantly easier to figure out what is happening when and where.

The timestamps added by timer_query on the other hand are visible in Godot's built-in profiler:

Profiler Screenshot:

image

Performance:

Both these changes are only enabled when #ifdef DEBUG_ENABLED is true, so they should have no performance impact on optimized builds. In addition, the Timestamps are also only recorded when Godot's script debugger is currently profiling.

The debug labels should have almost 0 overhead. I tested multiple scenes and any potential performance changes were so small that they weren't really measurable on my radeon rx 6900xt.


This PR was sponsored by Ramatak with 💚

@Ansraer Ansraer requested review from a team as code owners March 6, 2023 18:52
@akien-mga akien-mga added this to the 3.6 milestone Mar 6, 2023
@Ansraer Ansraer force-pushed the 3.x-profiling branch 7 times, most recently from 623bdeb to 3353345 Compare March 9, 2023 11:57
@clayjohn
Copy link
Member

clayjohn commented Mar 9, 2023

Cc @lawnjelly

@lawnjelly
Copy link
Member

What's the PR / PRs in master that added these BTW? This should be linked in the description.

@clayjohn
Copy link
Member

clayjohn commented Mar 9, 2023

What's the PR / PRs in master that added these BTW? This should be linked in the description.

I think they were adding during reduz's initial implementation of the Vulkan renderer. I don't think the implementation needs to match 4.0. I think we can merge this as long as it works in 3.x and is useful for people who are developing on 3.x

@lawnjelly
Copy link
Member

lawnjelly commented Mar 9, 2023

How do we use it? Does it need to be compiled in the normal build? Are there any performance implications of running it all the time? Is it possible to put some of it into a nested class so it doesn't create such a mess?

Really there is very little information to go on in the description imo.

I have a rough idea of what is up to, and it could be useful, but really the PR should be more descriptive with screenshots etc and how to, to make it clear how it can be used, why it is useful etc. Also imo where we can, we should have a clear separation between production code and debug / profiling code, and have the minimum of them mixed up together, in order to make reading / maintaining the useful code easier instead of adding the cognitive burden of trying to work out what extra profiling stuff is doing.

@Ansraer
Copy link
Contributor Author

Ansraer commented Mar 10, 2023

@lawnjelly Added some screenshots to the description and a short paragraph about performance. In 4.x this was added by reduz with #44094.

@lawnjelly
Copy link
Member

Overall it does look useful, BGFX does a little GPU logging like this .. it's potentially mildly useful. Usually I've found it's pretty obvious what is what in RenderDoc etc, but if this is "free" why not. As long as we don't pay any cost (in terms of runtime performance / driver bugs), and it's not overly intrusive in the code imo it seems okay.

The profiling inside Godot may be more useful if it enables more users to pinpoint what is slowing things down graphics wise, few of them seem to use GPU debuggers. 👍

My comments above are mostly nitpicks, different programmer will make different suggestions etc, and myself depending what side of the bed I got up from etc lol. 😀 But some attention to naming and namespaces is always a good idea - the easier it is for a layman to read, the better imo.

enable_frame_timings();
}
#endif
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poke @Ansraer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it. Will push an update in a day or two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still pending :)

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation seems fine to me. The only thing that may be a bit odd is that it reports GPU timings in the debugger profiler which otherwise contains all CPU times. Since the CPU and GPU are running async, users may be surprised to find that their frame time is not the total of each of the sub-headings.

On another note, it would be amazing if you could forward port the Android code to 4.x as right now timestamps aren't supported when running on Android in 4.x

Overall, minus the one change I suggested. I am okay with merging this as long as @lawnjelly is also happy with it. I tested locally and it appears to work well.

static void (*del_queries_func)(int n, unsigned int *ids);
static void (*query_counter_func)(unsigned int id, unsigned int target);
static void (*get_query_func)(unsigned int id, unsigned int pname, uint64_t *params);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this (or some of it) could go in a nested class? Just to make it clear the separation of what is part of Rasterizer, and what is to do with the debug timings? (I can't say for definite it would work, sometimes there are restrictions on nested classes.) 🤔

@lawnjelly
Copy link
Member

Looks fine to me too, my only comment is about using a nested class if possible, but if not, no biggie. 👍

@akien-mga akien-mga marked this pull request as draft August 2, 2023 15:43
@lawnjelly lawnjelly modified the milestones: 3.6, 3.7 Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants