Skip to content

Adds handling for malformed metrics #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

baweaver
Copy link

@baweaver baweaver commented Apr 8, 2025

In the Ruby GraphQL gem we have observed some interesting behavior where collectors appear not to be registered, which causes the code from that gem to error:

@trace.prometheus_client.send_json(
  # [!!] No name here
  type: @trace.prometheus_collector_type,
  duration: duration,
  platform_key: event_name,
  key: keyword
)

Source: https://github.com/rmosolgo/graphql-ruby/blob/ddf2550a204be69ba681739b529324a074d72c91/lib/graphql/tracing/prometheus_trace.rb#L68

On one hand this appears to be a malformed request body as-is, and the tests do not appear to test the integration in validating this behavior.

Given that, that brings up a potential edge case which may warrant some discussion: What should PrometheusExporter do in a case where it gets a malformed metric that does not have a name? Should it give an error? Right now what it returns is this:

#nomethoderror: undefined method `observe' for nil:NilClass - /opt/ruby3.0/lib/ruby/gems/3.0.0/gems/prometheus_exporter-0.8.1/lib/prometheus_exporter/server/collector.rb:55:in `block in process_hash'

...which does not give clear traceability into the cause and potential resolutions.

The opinion of this PR, at least, is to allow it to gracefully fail when a metric cannot be registered but I am not convinced that this is the correct behavior in this scenario and would defer to the folks working on the project as to what makes the most sense here.

My suspicion with the upstream issue is that it's some form of race condition in loading collectors.

Documented some things on the GraphQL gem:

rmosolgo/graphql-ruby#5323

Still suspecting some type of loading or race condition in here.

In the Ruby GraphQL gem we have observed some interesting behavior where collectors appear not to be registered, which causes the code from that gem to error:

```ruby

@trace.prometheus_client.send_json(
  # [!!] No name here
  type: @trace.prometheus_collector_type,
  duration: duration,
  platform_key: event_name,
  key: keyword
)
```

On one hand this appears to be a malformed request body as-is, and the tests do not appear to test the integration in validating this behavior.

Given that, that brings up a potential edge case which may warrant some discussion: What should PrometheusExporter do in a case where it gets a malformed metric that does not have a name? Should it give
an error? Right now what it returns is this:

```ruby
```

...which does not give clear tracability into the cause and potential resolutions.

The opinion of this PR, at least, is to allow it to gracefully fail when a metric cannot be registered but I am not convinced that this is the correct
behavior in this scenario and would defer to the folks working on the project as to what makes the most sense here.
@baweaver
Copy link
Author

baweaver commented Apr 9, 2025

One other potential strategy that would additionally remove the ObjectSpace invocations would be to use self.inherited for a registry instead, but that'd have some consequences:

  • server.rb would be more order-dependent
  • Given the usage of mutex it may not be great to CollectorChild.new and cache that in an effective singleton

...but it'd also make it easier for any downstream consumers making custom collectors.

If that is of interest I can get a PR up on that later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant