bug: GetRegularWorkspaceCreateMetrics creates too much DB load

As of November 6 this query is our 2nd most expensive query over the last 7d in terms of load over time.

It's not called too frequently (30k calls in a week) or returning a large data set (avg 92 rows), but it's avg. latency is > 160ms and the p99 is more like 740ms. The main source of the cost here is sequential scans of `provisioner_jobs` and `workspace_builds` in order to find the first successful build per workspace. 

<img width="439" height="572" alt="Image" src="https://github.com/user-attachments/assets/28c0adfe-62ea-4b77-82f2-822b91dcd817" />

Currently for `workspace builds` we have an explicit index on only the initiator ID and the actual build ID (PK), then implicit indices as a result of UNIQUE constraints for  `(workspace_id, name)`, `(workspace_id, build_number)`, and `(job_id)`. For `provisioner_jobs` we have explicit indices on `started_at`, `job_status`, and `worker_id, organization_id, completed_at)`.

We do not have indices on either `workspace_builds.transition` nor `provisioner_jobs.job_status` and thus have to do sequential scans of both tables in order to fetch the relevant data. For ourselves internally this is scans of ~80k rows (33k returned) and ~92k rows (70k returned) respectively. The join and sort after this are also _relatively_ expensive.

The join (from the plan) on `pj.id = wb.job_id` also results in a relatively high # of matches (~30k, so roughly 90% of workspace_build start builds succeeded).

I would propose adding indices similar to these:

1. btree index for `workspace_builds` on `(workspace_id, build_number, id)`. This index will already be ordered so should be able to eliminate the need for the sort after the join. AI is also telling me that if we include the other fields from the query in the index, those being `template_version_preset_id, initiator_id, and job_id`, we can speed the query up/reduce the cost further by eliminating the need for heap fetches from the `workspace_builds` table itself, but I'm not sure of this optimization nor whether we want the extra index size that would come from this.
2. an index for `provisioner_jobs` either on `(job_status, id)` OR on `(id) WHERE job_status = 'succeeded'`. The latter would help only this query, a more general index on `(job_status, id)` would help potentially multiple queries such as `GetWorkspacesEligibleForTransition`, `GetFailedWorkspaceBuildsByTemplateID`, `CountPendingNonActivePrebuilds`, and more.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: GetRegularWorkspaceCreateMetrics creates too much DB load #1115

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: GetRegularWorkspaceCreateMetrics creates too much DB load #1115

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions