Skip to content

Conversation

alessandro-avendano
Copy link
Contributor

@alessandro-avendano alessandro-avendano commented Jul 3, 2025

What does this PR do?

Adding AKS as a supported provider within the introspection feature, and adds DD_ADMISSION_CONTROLLER_ADD_AKS_SELECTORS env var to be true if the admission controller is enabled and introspection sees that a node is running on AKS.

Motivation

Customers do not have to manually add the correct configurations on the Datadog Agent to run on a AKS node, the operator will add the basic configurations needed automatically.

Additional Notes

https://docs.datadoghq.com/containers/kubernetes/distributions/?tab=datadogoperator#AKS
Based on previous documentation, we needed to add the admission controller env var AND make kubelet changes in order to run the agent on AKS. Because of some recent certificate changes that AKS made, the only thing the operator needs to adjust for AKS environments is the admission controller env var if the admission controller is enabled.

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

  • Enable introspection feature in the operator (manager.yaml)
  • Create AKS cluster with custom operator image
  • Once the operator is deployed:
    • Deploy the DDA with admission controller enabled (automatically enabled) and check to see in the cluster agent that the DD_ADMISSION_CONTROLLER_ADD_AKS_SELECTORS env var is set to true
    • Deploy the DDA with admission controller disabled and check to see in the cluster agent that DD_ADMISSION_CONTROLLER_ADD_AKS_SELECTORS is not added as an env var

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@alessandro-avendano alessandro-avendano added this to the v1.17.0 milestone Jul 3, 2025
@alessandro-avendano alessandro-avendano requested review from a team as code owners July 3, 2025 15:21
@alessandro-avendano alessandro-avendano added the enhancement New feature or request label Jul 3, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2025

Codecov Report

❌ Patch coverage is 50.83333% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.09%. Comparing base (5dff879) to head (7b89545).
⚠️ Report is 32 commits behind head on main.

Files with missing lines Patch % Lines
...ontroller/datadogagent/controller_reconcile_dca.go 47.29% 30 Missing and 9 partials ⚠️
...nal/controller/datadogagent/feature/cws/feature.go 0.00% 1 Missing ⚠️
...ntroller/datadogagent/feature/dogstatsd/feature.go 0.00% 1 Missing ⚠️
...l/controller/datadogagent/feature/dummy/feature.go 0.00% 1 Missing ⚠️
...ntroller/datadogagent/feature/ebpfcheck/feature.go 0.00% 1 Missing ⚠️
...ller/datadogagent/feature/enabledefault/feature.go 0.00% 1 Missing ⚠️
...nal/controller/datadogagent/feature/gpu/feature.go 0.00% 1 Missing ⚠️
...ller/datadogagent/feature/livecontainer/feature.go 0.00% 1 Missing ⚠️
...roller/datadogagent/feature/liveprocess/feature.go 0.00% 1 Missing ⚠️
...ller/datadogagent/feature/logcollection/feature.go 0.00% 1 Missing ⚠️
... and 11 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2030      +/-   ##
==========================================
+ Coverage   47.00%   47.09%   +0.09%     
==========================================
  Files         255      255              
  Lines       25302    25343      +41     
==========================================
+ Hits        11892    11936      +44     
+ Misses      12794    12785       -9     
- Partials      616      622       +6     
Flag Coverage Δ
unittests 47.09% <50.83%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...controller/datadogagent/controller_reconcile_v2.go 42.71% <100.00%> (ø)
...atadogagent/feature/admissioncontroller/feature.go 68.10% <100.00%> (+0.65%) ⬆️
...nal/controller/datadogagent/feature/apm/feature.go 67.07% <100.00%> (ø)
...nal/controller/datadogagent/feature/asm/feature.go 73.84% <100.00%> (ø)
...roller/datadogagent/feature/autoscaling/feature.go 86.36% <100.00%> (ø)
...ller/datadogagent/feature/clusterchecks/feature.go 65.14% <100.00%> (ø)
...al/controller/datadogagent/feature/cspm/feature.go 76.28% <100.00%> (ø)
...er/datadogagent/feature/eventcollection/feature.go 77.77% <100.00%> (ø)
...er/datadogagent/feature/externalmetrics/feature.go 55.73% <100.00%> (ø)
...ntroller/datadogagent/feature/helmcheck/feature.go 81.52% <100.00%> (ø)
... and 27 more

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dff879...7b89545. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

// Get provider list for introspection
providerList := map[string]struct{}{kubernetes.LegacyProvider: {}}
if r.options.IntrospectionEnabled {
nodeList, err := r.getNodeList(ctx)
Copy link
Contributor

@celenechang celenechang Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a costly operation, I wonder if it makes sense to make this call up one level and pass it to both this function and reconcileAgentProfiles ?

)
// Reconcile cluster agent for each provider
var errs []error
for provider := range providerList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to loop over all the providers, because we still want one DCA per cluster. (Let me know if I'm misunderstanding something here.) Perhaps we can add some logic to choose the "best" provider - for instance, if the entire cluster is AKS then use AKS. If it's mixed then make a deterministic choice

Copy link
Contributor

@swang392 swang392 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a fix for this in the EKS control plane monitoring PR #2031 (39d8c27)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be waiting for Sarah to merge her PR into main, and will then update my branch since the issue has been resolved

@levan-m levan-m modified the milestones: v1.17.0, v1.18.0 Jul 15, 2025
@levan-m levan-m modified the milestones: v1.18.0, v1.19.0 Aug 15, 2025
@levan-m levan-m modified the milestones: v1.19.0, v1.20.0 Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants