Skip to content

Malformed PrometheusExporter Metrics #5323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
baweaver opened this issue Apr 8, 2025 · 4 comments
Open

Malformed PrometheusExporter Metrics #5323

baweaver opened this issue Apr 8, 2025 · 4 comments

Comments

@baweaver
Copy link

baweaver commented Apr 8, 2025

Describe the bug

PrometheusExporter raises an exception on what appears to be a malformed request from GraphQL Ruby:

# GraphQL: lib/graphql/tracing/prometheus_trace.rb

@client.send_json(
  type: @collector_type,
  duration: duration,
  platform_key: platform_key,
  key: key
)

...wherein the handler parses it as such, comments prefixed with [!!] are mine:

# PrometheusExporter: lib/prometheus_exporter/server/collector.rb

def process_hash(obj)
  @mutex.synchronize do
    # [!!] This is probably the intended codepath, but it's not being hit?
    if collector = @collectors[obj["type"]]
      collector.collect(obj)
    # [!!] Well if that's intended and it fails how does it get handled?
    else
      # [!!] This object has no name here
      metric = @metrics[obj["name"]]
      # [!!] So it falls back here
      metric = register_metric_unsafe(obj) if !metric

      # [!!] This is singular, meaning the singular key may be ignored here
      keys = obj["keys"] || {}
      keys = obj["custom_labels"].merge(keys) if obj["custom_labels"]

      # [!!] This is also `nil`, but even if not it'd still try and call a method on
      # metric which will be `nil`
      case obj["prometheus_exporter_action"]
      # ...
      else
        metric.observe(obj["value"], keys)
      end
    end
  end
end

def register_metric_unsafe(obj)
  # [!!!] This is still `nil`
  name = obj["name"]
  help = obj["help"]
  opts = symbolize_keys(obj["opts"] || {})

  # [!!] The type is probably incompatible with metric
  # types which will cause this to also fail
  metric =
    case obj["type"]
    when "gauge"
      PrometheusExporter::Metric::Gauge.new(name, help)
    when "counter"
      PrometheusExporter::Metric::Counter.new(name, help)
    when "summary"
      PrometheusExporter::Metric::Summary.new(name, help, opts)
    when "histogram"
      PrometheusExporter::Metric::Histogram.new(name, help, opts)
    end

  # [!!] But then this also fails, and does so silently.
  if metric
    @metrics[name] = metric
  else
    STDERR.puts "failed to register metric #{obj}"
  end
end

So the error exists between the two gems somehow. I've filed a PR against PrometheusExporter to try and handle that edge case:

discourse/prometheus_exporter#337

...but wonder if there's something that can be done here as well to mitigate that error.

Versions

graphql version: 2.3.2
rails (or other framework): 7.1.x
other applicable versions (graphql-batch, etc)

Steps to reproduce

Unfortunately this looks to only occur in CodeSpaces, which I do not use, which makes me suspect that this is some form of race condition on loading collectors somehow, but I am not able to validate that at this time.

I'm working on finding another instance of this error that we can delve further into, but most folks now are nuking their environment and rebuilding it to get things going again.

Expected behavior

Sending of a collector metric does not cause an exception.

Actual behavior

Exception is raised on a non-existent metric

Place full backtrace here (if a Ruby exception is involved):

Click to view exception backtrace
# [2025-04-08T12:50:35.700-07:00] ERROR:   
# NoMethodError (undefined method `observe' for nil:NilClass):
# prometheus_exporter (2.2.0) lib/prometheus_exporter/server/collector.rb:52:in `block in process_hash'
# prometheus_exporter (2.2.0) lib/prometheus_exporter/server/collector.rb:36:in `synchronize'
# prometheus_exporter (2.2.0) lib/prometheus_exporter/server/collector.rb:36:in `process_hash'
# prometheus_exporter (2.2.0) lib/prometheus_exporter/server/collector.rb:32:in `process'
# prometheus_exporter (2.2.0) lib/prometheus_exporter/client.rb:267:in `send'
# prometheus_exporter (2.2.0) lib/prometheus_exporter/client.rb:124:in `send_json'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:76:in `instrument_execution'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:27:in `parse'
# graphql (2.3.2) lib/graphql/language/parser.rb:43:in `parse'
# graphql (2.3.2) lib/graphql/language/parser.rb:16:in `parse'
# graphql (2.3.2) lib/graphql.rb:46:in `parse'
# graphql (2.3.2) lib/graphql/query.rb:398:in `prepare_ast'
# graphql (2.3.2) lib/graphql/query.rb:458:in `with_prepared_ast'
# graphql (2.3.2) lib/graphql/query.rb:316:in `validation_pipeline'
# graphql (2.3.2) lib/graphql/query.rb:324:in `valid?'
# graphql (2.3.2) lib/graphql/analysis/ast.rb:27:in `block (2 levels) in analyze_multiplex'
# graphql (2.3.2) lib/graphql/analysis/ast.rb:26:in `map'
# graphql (2.3.2) lib/graphql/analysis/ast.rb:26:in `block in analyze_multiplex'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace/tracer.rb:37:in `block in in_span'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace.rb:70:in `block in with_span'
# opentelemetry-api (1.4.0) lib/opentelemetry/context.rb:87:in `with_value'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace.rb:70:in `with_span'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace/tracer.rb:37:in `in_span'
# opentelemetry-instrumentation-graphql (0.26.2) lib/opentelemetry/instrumentation/graphql/tracers/graphql_trace.rb:53:in `analyze_multiplex'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:27:in `block in analyze_multiplex'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:74:in `instrument_execution'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:27:in `analyze_multiplex'
# graphql (2.3.2) lib/graphql/analysis/ast.rb:25:in `analyze_multiplex'
# graphql (2.3.2) lib/graphql/execution/interpreter.rb:46:in `block in run_all'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace/tracer.rb:37:in `block in in_span'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace.rb:70:in `block in with_span'
# opentelemetry-api (1.4.0) lib/opentelemetry/context.rb:87:in `with_value'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace.rb:70:in `with_span'
# opentelemetry-api (1.4.0) lib/opentelemetry/trace/tracer.rb:37:in `in_span'
# opentelemetry-instrumentation-graphql (0.26.2) lib/opentelemetry/instrumentation/graphql/tracers/graphql_trace.rb:25:in `execute_multiplex'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:27:in `block in execute_multiplex'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:74:in `instrument_execution'
# graphql (2.3.2) lib/graphql/tracing/prometheus_trace.rb:27:in `execute_multiplex'
# graphql-batch (0.6.0) lib/graphql/batch/setup_multiplex.rb:24:in `execute_multiplex'
# graphql (2.3.2) lib/graphql/execution/interpreter.rb:37:in `run_all'
# graphql (2.3.2) lib/graphql/schema.rb:1323:in `multiplex'
# graphql (2.3.2) lib/graphql/schema.rb:1299:in `execute'
# packs/one_graph/app/controllers/api/graphql_controller.rb:114:in `execute_query'
# packs/one_graph/app/controllers/api/graphql_controller.rb:77:in `execute'
# actionpack (7.1.3.4) lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'
# actionpack (7.1.3.4) lib/abstract_controller/base.rb:224:in `process_action'
# actionpack (7.1.3.4) lib/action_controller/metal/rendering.rb:165:in `process_action'
# actionpack (7.1.3.4) lib/abstract_controller/callbacks.rb:259:in `block in process_action'
# activesupport (7.1.3.4) lib/active_support/callbacks.rb:121:in `block in run_callbacks'
# sec-events-ruby (c24ed3dabcb4) lib/sec_events_ruby/sec_events_rails.rb:26:in `security_events_logging'
# activesupport (7.1.3.4) lib/active_support/callbacks.rb:130:in `block in run_callbacks'
# sentry-rails (5.15.0) lib/sentry/rails/controller_transaction.rb:28:in `block in sentry_around_action'
# sentry-ruby (5.15.0) lib/sentry/hub.rb:102:in `with_child_span'
# sentry-ruby (5.15.0) lib/sentry-ruby.rb:476:in `with_child_span'
# sentry-rails (5.15.0) lib/sentry/rails/controller_transaction.rb:14:in `sentry_around_action'
# activesupport (7.1.3.4) lib/active_support/callbacks.rb:130:in `block in run_callbacks'
# activesupport (7.1.3.4) lib/active_support/callbacks.rb:141:in `run_callbacks'
# actionpack (7.1.3.4) lib/abstract_controller/callbacks.rb:258:in `process_action'
# actionpack (7.1.3.4) lib/action_controller/metal/rescue.rb:25:in `process_action'
# actionpack (7.1.3.4) lib/action_controller/metal/instrumentation.rb:74:in `block in process_action'
# activesupport (7.1.3.4) lib/active_support/notifications.rb:206:in `block in instrument'
# activesupport (7.1.3.4) lib/active_support/notifications/instrumenter.rb:58:in `instrument'
# activesupport (7.1.3.4) lib/active_support/notifications.rb:206:in `instrument'
# actionpack (7.1.3.4) lib/action_controller/metal/instrumentation.rb:73:in `process_action'
# actionpack (7.1.3.4) lib/action_controller/metal/params_wrapper.rb:261:in `process_action'
# activerecord (7.1.3.4) lib/active_record/railties/controller_runtime.rb:32:in `process_action'
# actionpack (7.1.3.4) lib/abstract_controller/base.rb:160:in `process'
# actionview (7.1.3.4) lib/action_view/rendering.rb:40:in `process'
# actionpack (7.1.3.4) lib/action_controller/metal.rb:227:in `dispatch'
# opentelemetry-instrumentation-action_pack (0.7.1) lib/opentelemetry/instrumentation/action_pack/patches/action_controller/metal.rb:27:in `dispatch'
# actionpack (7.1.3.4) lib/action_controller/metal.rb:309:in `dispatch'
# actionpack (7.1.3.4) lib/action_dispatch/routing/route_set.rb:49:in `dispatch'
# actionpack (7.1.3.4) lib/action_dispatch/routing/route_set.rb:32:in `serve'
# actionpack (7.1.3.4) lib/action_dispatch/journey/router.rb:51:in `block in serve'
# actionpack (7.1.3.4) lib/action_dispatch/journey/router.rb:131:in `block in find_routes'
# actionpack (7.1.3.4) lib/action_dispatch/journey/router.rb:124:in `each'
# actionpack (7.1.3.4) lib/action_dispatch/journey/router.rb:124:in `find_routes'
# actionpack (7.1.3.4) lib/action_dispatch/journey/router.rb:32:in `serve'
# actionpack (7.1.3.4) lib/action_dispatch/routing/route_set.rb:882:in `call'
# omniauth (2.1.0) lib/omniauth/strategy.rb:202:in `call!'
# omniauth (2.1.0) lib/omniauth/strategy.rb:169:in `call'
# omniauth (2.1.0) lib/omniauth/builder.rb:44:in `call'
# vendor/middleware/authenticity_token_check.rb:31:in `call'
# apollo_upload_server (2.1.0) lib/apollo_upload_server/middleware.rb:19:in `call'
# routes_lazy_routes (0.4.3) lib/routes_lazy_routes/lazy_routes_middleware.rb:16:in `call'
# vendor/middleware/reset_log_fields.rb:8:in `call'
# warden (1.2.9) lib/warden/manager.rb:36:in `block in call'
# warden (1.2.9) lib/warden/manager.rb:34:in `catch'
# warden (1.2.9) lib/warden/manager.rb:34:in `call'
# activerecord (7.1.3.4) lib/active_record/middleware/database_selector.rb:67:in `block in call'
# activesupport (7.1.3.4) lib/active_support/notifications/instrumenter.rb:58:in `instrument'
# activerecord (7.1.3.4) lib/active_record/middleware/database_selector/resolver.rb:64:in `block in read_from_replica'
# activerecord (7.1.3.4) lib/active_record/connection_handling.rb:361:in `with_role_and_shard'
# activerecord (7.1.3.4) lib/active_record/connection_handling.rb:147:in `connected_to'
# activerecord (7.1.3.4) lib/active_record/middleware/database_selector/resolver.rb:63:in `read_from_replica'
# vendor/middleware/database_selector/feature_flag_db_resolver.rb:43:in `read'
# activerecord (7.1.3.4) lib/active_record/middleware/database_selector.rb:77:in `select_database'
# activerecord (7.1.3.4) lib/active_record/middleware/database_selector.rb:66:in `call'
# rack (2.2.10) lib/rack/tempfile_reaper.rb:15:in `call'
# rack (2.2.10) lib/rack/etag.rb:27:in `call'
# rack (2.2.10) lib/rack/conditional_get.rb:40:in `call'
# rack (2.2.10) lib/rack/head.rb:12:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/http/permissions_policy.rb:36:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/http/content_security_policy.rb:33:in `call'
# rack (2.2.10) lib/rack/session/abstract/id.rb:266:in `context'
# rack (2.2.10) lib/rack/session/abstract/id.rb:260:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/cookies.rb:689:in `call'
# activerecord (7.1.3.4) lib/active_record/migration.rb:655:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/callbacks.rb:29:in `block in call'
# activesupport (7.1.3.4) lib/active_support/callbacks.rb:101:in `run_callbacks'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/callbacks.rb:28:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/executor.rb:14:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/actionable_exceptions.rb:16:in `call'
# structured_logger (e1f78b0b5c66) lib/structured_logger/capture_exceptions_middleware.rb:11:in `call'
# sentry-rails (5.15.0) lib/sentry/rails/rescued_exception_interceptor.rb:12:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/debug_exceptions.rb:29:in `call'
# structured_logger (e1f78b0b5c66) lib/structured_logger/capture_exceptions_middleware.rb:11:in `call'
# sentry-ruby (5.15.0) lib/sentry/rack/capture_exceptions.rb:28:in `block (2 levels) in call'
# sentry-ruby (5.15.0) lib/sentry/hub.rb:251:in `with_session_tracking'
# sentry-ruby (5.15.0) lib/sentry-ruby.rb:387:in `with_session_tracking'
# sentry-ruby (5.15.0) lib/sentry/rack/capture_exceptions.rb:19:in `block in call'
# sentry-ruby (5.15.0) lib/sentry/hub.rb:59:in `with_scope'
# sentry-ruby (5.15.0) lib/sentry-ruby.rb:367:in `with_scope'
# sentry-ruby (5.15.0) lib/sentry/rack/capture_exceptions.rb:18:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/show_exceptions.rb:31:in `call'
# railties (7.1.3.4) lib/rails/rack/logger.rb:37:in `call_app'
# railties (7.1.3.4) lib/rails/rack/logger.rb:24:in `block in call'
# activesupport (7.1.3.4) lib/active_support/tagged_logging.rb:135:in `block in tagged'
# activesupport (7.1.3.4) lib/active_support/tagged_logging.rb:39:in `tagged'
# activesupport (7.1.3.4) lib/active_support/tagged_logging.rb:135:in `tagged'
# activesupport (7.1.3.4) lib/active_support/broadcast_logger.rb:240:in `method_missing'
# railties (7.1.3.4) lib/rails/rack/logger.rb:24:in `call'
# sprockets-rails (3.5.1) lib/sprockets/rails/quiet_assets.rb:17:in `call'
# structured_logger (e1f78b0b5c66) lib/structured_logger/log_middleware.rb:28:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/remote_ip.rb:92:in `call'
# request_store (1.7.0) lib/request_store/middleware.rb:19:in `call'
# rack-timeout (0.6.3) lib/rack/timeout/core.rb:148:in `block in call'
# rack-timeout (0.6.3) lib/rack/timeout/support/timeout.rb:19:in `timeout'
# rack-timeout (0.6.3) lib/rack/timeout/core.rb:147:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/request_id.rb:28:in `call'
# rack (2.2.10) lib/rack/method_override.rb:24:in `call'
# rack (2.2.10) lib/rack/runtime.rb:22:in `call'
# activesupport (7.1.3.4) lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/server_timing.rb:59:in `block in call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/server_timing.rb:24:in `collect_events'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/server_timing.rb:58:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/executor.rb:14:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/static.rb:25:in `call'
# rack (2.2.10) lib/rack/sendfile.rb:110:in `call'
# actionpack (7.1.3.4) lib/action_dispatch/middleware/host_authorization.rb:141:in `call'
# rack (2.2.10) lib/rack/deflater.rb:44:in `call'
# vendor/middleware/asset_direct_access_blocker.rb:25:in `call'
# secure_headers (6.5.0) lib/secure_headers/middleware.rb:11:in `call'
# rack-mini-profiler (3.3.1) lib/mini_profiler.rb:334:in `call'
# rack-cors (1.1.1) lib/rack/cors.rb:100:in `call'
# rack (2.2.10) lib/rack/events.rb:112:in `call'
# prometheus_exporter (2.2.0) lib/prometheus_exporter/middleware.rb:57:in `call'
# railties (7.1.3.4) lib/rails/engine.rb:536:in `call'
# puma (6.4.2) lib/puma/configuration.rb:272:in `call'
# puma (6.4.2) lib/puma/request.rb:100:in `block in handle_request'
# puma (6.4.2) lib/puma/thread_pool.rb:378:in `with_force_shutdown'
# puma (6.4.2) lib/puma/request.rb:99:in `handle_request'
# puma (6.4.2) lib/puma/server.rb:464:in `process_client'
# puma (6.4.2) lib/puma/server.rb:2

Additional context

Add any other context about the problem here.

With these details, we can efficiently hunt down the bug!

@baweaver
Copy link
Author

baweaver commented Apr 8, 2025

Ok, for reproduction, if I can somehow trigger the behavior, I would bet that it's something around the GraphQL collector here:

# lib/graphql/tracing/prometheus_trace/graphql_collector.rb

# Backwards-compat: ([!!] I don't think this is related, old version?)
PrometheusTracing::GraphQLCollector = PrometheusTrace::GraphQLCollector

# lib/graphql/tracing/prometheus_trace.rb

module GraphQL
  module Tracing
    # ...
    PrometheusTrace = MonitorTrace.create_module("prometheus")
    module PrometheusTrace
      # [!!] Is it possible that this doesn't get hit?
      if defined?(PrometheusExporter::Server)
        autoload :GraphQLCollector, "graphql/tracing/prometheus_trace/graphql_collector"
      end

      # ...

...which I don't entirely get how it gets loaded in considering the collector in PrometheusExporter:

# lib/prometheus_exporter/server/collector.rb

module PrometheusExporter::Server
  class Collector < CollectorBase
    def initialize(json_serializer: nil)
      # ...
      @collectors = {}
      register_collector(WebCollector.new)
      register_collector(ProcessCollector.new)
      register_collector(SidekiqCollector.new)
      register_collector(SidekiqQueueCollector.new)
      register_collector(SidekiqProcessCollector.new)
      register_collector(SidekiqStatsCollector.new)
      register_collector(DelayedJobCollector.new)
      register_collector(PumaCollector.new)
      register_collector(HutchCollector.new)
      register_collector(UnicornCollector.new)
      register_collector(ActiveRecordCollector.new)
      register_collector(ShoryukenCollector.new)
      register_collector(ResqueCollector.new)
      register_collector(GoodJobCollector.new)
    end

    def register_collector(collector)
      @collectors[collector.type] = collector
    end

@rmosolgo
Copy link
Owner

rmosolgo commented Apr 9, 2025

Hey, sorry for the trouble! I have never actually used Prometheus so I don't know a lot about how its supposed to work, but I wonder if this isn't part of the puzzle:

# @example Running a custom type collector
# # lib/graphql_collector.rb
# if defined?(PrometheusExporter::Server)
# require 'graphql/tracing'
#
# class GraphQLCollector < GraphQL::Tracing::PrometheusTrace::GraphQLCollector
# end
# end
#
# # Then run:
# # bundle exec prometheus_exporter -a lib/graphql_collector.rb

That would define the necessary collector, but I'm not sure how its supposed to get passed to register_collector. Do you know if there's another part of the system setup where you can register it?

The docs for PrometheusExporter show use of register_metric... maybe register_collector could be used in a similar way: https://github.com/discourse/prometheus_exporter?tab=readme-ov-file#single-process-mode

If that helps, then it should definitely be added to the GraphQL-Ruby docs!

@baweaver
Copy link
Author

baweaver commented Apr 9, 2025

Yeah it's a fascinating little issue between the libraries. Once I narrow it down and find a solid way to reproduce it I'll try and get a docs patch in too.

They do have an auto-load system over on that side using ObjectSpace:

https://github.com/discourse/prometheus_exporter/blob/46e88afd2390df4bff2c4035028922525383efca/bin/prometheus_exporter#L114

if custom_type_collectors_filenames.length > 0
  custom_type_collectors_filenames.each do |t|
    require File.expand_path(t)
  end

  ObjectSpace.each_object(Class) do |klass|
    if klass < PrometheusExporter::Server::TypeCollector
      options[:type_collectors] ||= []
      options[:type_collectors] << klass
    end
  end
end

...and I believe I saw a similar invocation in our code for runtime. If that's the case it's 100% somewhere on our side as far as the load order issue, and likely something to handle more explicitly in the PrometheusExporter code. I can add a docs patch some time later for it as well, or see if there's a patch we want to put into this code to ensure it's registered at runtime.

@rmosolgo
Copy link
Owner

rmosolgo commented Apr 9, 2025

auto-load system

Ah, yes! That would do it. I checked TypeCollector for a def self.inherited hook that might do that. But I didn't imagine that somewhere, someone was iterating over each_object 😆

Are you able to puts (or whatever) from the server process to confirm that GraphQLCollector isn't getting loaded? Or maybe that the "type" => ... being sent along doesn't match any of the ones in the lookup hash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants