-
Notifications
You must be signed in to change notification settings - Fork 223
Description
Some users of the stackdriver-adapter store metrics in Cloud Monitoring in a separate project from the one where the cluster runs. However, when the adapter builds requests, it obtains the GCP project from the "ambient"1 (i.e. GCE metadata endpoint).
There are at least two deployments that are problematic today.
-
Hub and spoke: In this model, multiple single tenant clusters collect and ingest metrics into a single "Hub" project. This is usually to provide a single pane of glass.
-
Project per tenant: In this model, each tenant (i.e. namespace) collects and ingests metrics for its workloads into a dedicated per-tenant project.
Both models are not supported today as in both of them the metric project is different than the cluster project.
Prior work
Note that in #212 there was an attempt to fix this, but it partially addressed the issue. The fix works by making the adapter give special semantics to the resource.labels.project_id
label. If provided, the adapter uses it for both filtering and as the metric host project.
There are two issues with this approach:
- It assumes the invariant that the project_id label of a given metric always matches the project where the metric is hosted. In a scenario where metrics are ingested into a different project (e.g. the Hub and Spoke setup), this invariant doesn’t hold.
- It only works for external metrics (custom metrics is not implemented).
Proposal
A possible fix is to add a flag to the adapter to override the ambient GCP project. Exposing this as a global configuration in the adapter makes sense in the Hub and Spoke setup as it removes the need to annotate every HPA resource with the project ID label matcher.
Additionally, we could introduce a well-known label key to indicate the consumer project where the metric is stored. This new dedicated key would work for all HPA metric types (Pods, Object, and External).
For example:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-workload
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-workload
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: <metric>
selector:
matchLabels:
resource.labels.metrics_host_project_id = <metrics-project-id>
target:
type: Value
value: "90"
In terms of precedence, a label selector will always override the global flag, which will always override the ambient configuration.