Skip to content

Conversation

@camd
Copy link
Collaborator

@camd camd commented Oct 13, 2025

This PR simplifies the query Push health uses to find failures.

@camd camd self-assigned this Oct 13, 2025
@camd camd force-pushed the camd/push-health-query-fix branch from 6e0bcca to 4ed8891 Compare October 13, 2025 21:05
@camd camd force-pushed the camd/push-health-query-fix branch from 4ed8891 to 0617215 Compare October 13, 2025 21:05
@camd camd force-pushed the camd/push-health-query-fix branch 3 times, most recently from c7d8089 to 822d896 Compare October 13, 2025 21:56
push=push,
job_type__name__in=failed_job_types,
result__in=["success", "unknown"],
).select_related("job_type", "machine_platform", "taskcluster_metadata")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this all_jobs- we could have failing jobs for a different reason (i.e. infra, or other test) and it would be good to know that this test failed 2/6 times, even if the job failed 5/6 times. log parser does this automatically by detecting intermittents (about 70% of them) and marks them as failure_classification_id=8.

I would recommend a single query to get all the jobs and related fields, this would allow you to query within the data for failure classification and sort by job_type.name. Most pushes (especially on try) will have <500 jobs, so this seems preferable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be resolved in the next patch to further implement changes.

# Get commit history for Hg repos
commit_history_details = None
if repository.dvcs_type == "hg":
commit_history_details = get_commit_history(repository, revision, push)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need commit history? that is the old method IIRC>

) and push_result != "fail":
push_result = metric_result
elif metric_result == "fail":
push_result = metric_result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concept of indeterminate or unknown is resolved by failure_classification_id, so we can simplify this code, basically the map is:
failure_classification_id=1 (pass/fail/unknown)
failure_classification_id=4 (classified - currently only by humans)
failure_classification_id=6 (new failure- finds all regressions on autoland/central; uses 21 day cache)
failure_classification_id=8 (intermittent needs bugid- log parsing determined that future data existed such that this didn't reproduce and needs to be classified with a bug. if more future data comes in and it is almost perma fail, this gets reset to id=1||6.

knowing that, we have determinations made and don't need logic to summarize this. Basically we have:

  • new failure - highest priority to look at
  • intermittent needs bug - ignore for now, small edge cases could result in real issues
  • not classified (id=1) - everything else- ideally not much is left here that is a failure.

"repo": repository.name,
"needInvestigation": len(push_health_test_failures["needInvestigation"]),
"author": push.author,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we are keeping newrelic around, I think sentry we are getting rid of, not sure about newrelic. @Archaeopteryx do you know about newrelic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops have suggested a planned to migrate to OpenTelemetry. I have not heard of changes to sentry use.

devServer: {
host: '0.0.0.0',
port: 5000,
port: 5001,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this keeps sneaking in

@camd camd force-pushed the camd/push-health-query-fix branch from 822d896 to 50e5b7b Compare October 14, 2025 03:58
@camd camd force-pushed the camd/push-health-query-fix branch from 50e5b7b to 68f3e89 Compare October 18, 2025 19:29
@jmaher
Copy link
Collaborator

jmaher commented Dec 1, 2025

@camd should we condition to work on this, or assume what has landed is the path forward and assume any future changes will occur naturally unrelated to this specific PR?

@camd camd marked this pull request as draft December 6, 2025 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants