Adds handling for malformed metrics #337
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the Ruby GraphQL gem we have observed some interesting behavior where collectors appear not to be registered, which causes the code from that gem to error:
Source: https://github.com/rmosolgo/graphql-ruby/blob/ddf2550a204be69ba681739b529324a074d72c91/lib/graphql/tracing/prometheus_trace.rb#L68
On one hand this appears to be a malformed request body as-is, and the tests do not appear to test the integration in validating this behavior.
Given that, that brings up a potential edge case which may warrant some discussion: What should PrometheusExporter do in a case where it gets a malformed metric that does not have a name? Should it give an error? Right now what it returns is this:
#nomethoderror: undefined method `observe' for nil:NilClass - /opt/ruby3.0/lib/ruby/gems/3.0.0/gems/prometheus_exporter-0.8.1/lib/prometheus_exporter/server/collector.rb:55:in `block in process_hash'
...which does not give clear traceability into the cause and potential resolutions.
The opinion of this PR, at least, is to allow it to gracefully fail when a metric cannot be registered but I am not convinced that this is the correct behavior in this scenario and would defer to the folks working on the project as to what makes the most sense here.
My suspicion with the upstream issue is that it's some form of race condition in loading collectors.
Documented some things on the GraphQL gem:
rmosolgo/graphql-ruby#5323
Still suspecting some type of loading or race condition in here.