Skip to content

chore(ci_visibility): support xdist via distributed event hub (POC) #13504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vitor-de-araujo
Copy link
Contributor

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

CODEOWNERS have been resolved as:

ddtrace/_trace/span.py                                                  @DataDog/apm-sdk-api-python
ddtrace/contrib/internal/pytest/_plugin_v2.py                           @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_retry_utils.py                         @DataDog/ci-app-libraries
ddtrace/internal/ci_visibility/api/_base.py                             @DataDog/ci-app-libraries
ddtrace/internal/core/event_hub.py                                      @DataDog/apm-core-python

def remote_dispatch_with_results(*args):
print("ꙮ Remote dispatch_with_results: {args}", flush=True)
response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch_with_results", data=pickle.dumps(args))
result = EventResultDict(pickle.loads(response.read()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Vulnerability

pickle.loads is not safe for deserializing unstrusted data (...read more)

Do not deserialize untrusted data. Make sure you use alternatives to check that the data can be deserialized safely. There is no workaround around this: unless you really trust the data source, it's better to use another way to exchange data, such as an API or other protocols such as protobuf or thrift.

Learn More

View in Datadog  Leave us feedback  Documentation

return
print("ꙮ Remote dispatch: {args}", flush=True)
response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch", data=pickle.dumps(args))
result = pickle.loads(response.read())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Vulnerability

pickle.loads is not safe for deserializing unstrusted data (...read more)

Do not deserialize untrusted data. Make sure you use alternatives to check that the data can be deserialized safely. There is no workaround around this: unless you really trust the data source, it's better to use another way to exchange data, such as an API or other protocols such as protobuf or thrift.

Learn More

View in Datadog  Leave us feedback  Documentation

def do_POST(self):
length = int(self.headers.get('content-length'))
body = self.rfile.read(length)
args = pickle.loads(body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Vulnerability

pickle.loads is not safe for deserializing unstrusted data (...read more)

Do not deserialize untrusted data. Make sure you use alternatives to check that the data can be deserialized safely. There is no workaround around this: unless you really trust the data source, it's better to use another way to exchange data, such as an API or other protocols such as protobuf or thrift.

Learn More

View in Datadog  Leave us feedback  Documentation

elif self.path == "/dispatch_with_results":
method = core.dispatch_with_results
else:
raise Exception("unknown core method")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Exception is too generic (...read more)

Do not raise Exception and BaseException. These are too generic. Having generic exceptions makes it difficult to differentiate errors in a program. Use a specific exception, for example, ValueError, or create your own instead of using generic ones.

Learn More

View in Datadog  Leave us feedback  Documentation

body = pickle.dumps(result)

self.wfile.write(b"HTTP/1.0 200 OK\r\n")
self.wfile.write(b"content-length: " + (str(len(body)).encode('utf-8')) + b"\r\n\r\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Code Quality Violation

use f-string or .format to format strings (...read more)

Concatenation of multiple strings is not efficient and make the code hard to read and understand.

Instead of concatenating multiple strings, use an f-string or a format string.

Learn More

View in Datadog  Leave us feedback  Documentation

def do_POST(self):
length = int(self.headers.get('content-length'))
body = self.rfile.read(length)
args = pickle.loads(body)

Check failure

Code scanning / CodeQL

Deserialization of user-controlled data Critical

Unsafe deserialization depends on a
user-provided value
.

Copilot Autofix

AI 19 days ago

To fix the issue, we should replace the use of pickle.loads with a safer alternative for deserialization. The recommended approach is to use json.loads, which does not allow arbitrary code execution during deserialization. This requires ensuring that the data being sent to the server is serialized as JSON instead of using pickle.dumps.

The following changes are needed:

  1. Replace pickle.loads with json.loads on line 166.
  2. Replace pickle.dumps with json.dumps on line 180 and in the remote_dispatch and remote_dispatch_with_results functions.
  3. Update the EventResultDict handling to ensure compatibility with JSON serialization/deserialization.

Suggested changeset 1
ddtrace/internal/core/event_hub.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/ddtrace/internal/core/event_hub.py b/ddtrace/internal/core/event_hub.py
--- a/ddtrace/internal/core/event_hub.py
+++ b/ddtrace/internal/core/event_hub.py
@@ -10,3 +10,3 @@
 from typing import Tuple
-
+import json
 from ddtrace.settings._config import config
@@ -165,3 +165,3 @@
         body = self.rfile.read(length)
-        args = pickle.loads(body)
+        args = json.loads(body)
 
@@ -179,3 +179,3 @@
         #print("RESULT:", result, flush=True)
-        body = pickle.dumps(result)
+        body = json.dumps(result).encode('utf-8')
 
@@ -211,4 +211,4 @@
     print("ꙮ Remote dispatch: {args}", flush=True)
-    response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch", data=pickle.dumps(args))
-    result = pickle.loads(response.read())
+    response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch", data=json.dumps(args).encode('utf-8'))
+    result = json.loads(response.read().decode('utf-8'))
     print("ꙮ Remote dispatch result: {result}")
@@ -218,4 +218,4 @@
     print("ꙮ Remote dispatch_with_results: {args}", flush=True)
-    response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch_with_results", data=pickle.dumps(args))
-    result = EventResultDict(pickle.loads(response.read()))
+    response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch_with_results", data=json.dumps(args).encode('utf-8'))
+    result = EventResultDict(json.loads(response.read().decode('utf-8')))
     print("ꙮ Remote dispatch_with_results result: {result}")
EOF
@@ -10,3 +10,3 @@
from typing import Tuple

import json
from ddtrace.settings._config import config
@@ -165,3 +165,3 @@
body = self.rfile.read(length)
args = pickle.loads(body)
args = json.loads(body)

@@ -179,3 +179,3 @@
#print("RESULT:", result, flush=True)
body = pickle.dumps(result)
body = json.dumps(result).encode('utf-8')

@@ -211,4 +211,4 @@
print("ꙮ Remote dispatch: {args}", flush=True)
response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch", data=pickle.dumps(args))
result = pickle.loads(response.read())
response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch", data=json.dumps(args).encode('utf-8'))
result = json.loads(response.read().decode('utf-8'))
print("ꙮ Remote dispatch result: {result}")
@@ -218,4 +218,4 @@
print("ꙮ Remote dispatch_with_results: {args}", flush=True)
response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch_with_results", data=pickle.dumps(args))
result = EventResultDict(pickle.loads(response.read()))
response = urlopen(f"http://{DD_CORE_HOST}:{DD_CORE_PORT}/dispatch_with_results", data=json.dumps(args).encode('utf-8'))
result = EventResultDict(json.loads(response.read().decode('utf-8')))
print("ꙮ Remote dispatch_with_results result: {result}")
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 238 ± 2 ms.

The average import time from base is: 234 ± 2 ms.

The import time difference between this PR and base is: 4.06 ± 0.08 ms.

Import time breakdown

The following import paths have appeared:

ddtrace.auto 7.671 ms (3.23%)
ddtrace 7.671 ms (3.23%)
ddtrace._monkey 7.671 ms (3.23%)
ddtrace.appsec._listeners 7.671 ms (3.23%)
ddtrace.internal.core 7.671 ms (3.23%)
ddtrace.internal.core.event_hub 7.671 ms (3.23%)
http.server 3.716 ms (1.56%)
html 1.777 ms (0.75%)
html.entities 1.199 ms (0.50%)
socketserver 0.683 ms (0.29%)
mimetypes 0.604 ms (0.25%)
_winapi 0.117 ms (0.05%)
winreg 0.100 ms (0.04%)
urllib.request 2.160 ms (0.91%)
urllib.error 0.644 ms (0.27%)
urllib.response 0.313 ms (0.13%)
pickle 1.795 ms (0.76%)
_pickle 0.390 ms (0.16%)
_compat_pickle 0.353 ms (0.15%)

The following import paths have disappeared:

ddtrace.auto 1.775 ms (0.75%)
ddtrace.bootstrap.sitecustomize 1.775 ms (0.75%)
ddtrace.bootstrap.preload 1.775 ms (0.75%)
multiprocessing 1.775 ms (0.75%)
multiprocessing.context 1.775 ms (0.75%)
multiprocessing.reduction 1.775 ms (0.75%)
pickle 1.775 ms (0.75%)
_compat_pickle 0.429 ms (0.18%)
_pickle 0.384 ms (0.16%)

The following import paths have grown:

ddtrace.auto 2.179 ms (0.92%)
ddtrace.bootstrap.sitecustomize 1.706 ms (0.72%)
ddtrace.bootstrap.preload 1.571 ms (0.66%)
ddtrace.internal.products 0.866 ms (0.36%)
importlib.metadata 0.783 ms (0.33%)
zipfile 0.638 ms (0.27%)
importlib.abc 0.077 ms (0.03%)
importlib.resources 0.077 ms (0.03%)
importlib.resources._common 0.077 ms (0.03%)
importlib.resources._adapters 0.077 ms (0.03%)
importlib.metadata._adapters 0.068 ms (0.03%)
multiprocessing.sharedctypes 0.547 ms (0.23%)
multiprocessing.heap 0.547 ms (0.23%)
mmap 0.547 ms (0.23%)
ddtrace.settings.profiling 0.158 ms (0.07%)
ddtrace.vendor.psutil 0.077 ms (0.03%)
ddtrace._trace.trace_handlers 0.035 ms (0.01%)
ddtrace 0.473 ms (0.20%)
ddtrace.trace 0.366 ms (0.15%)
ddtrace._trace.filters 0.278 ms (0.12%)
ddtrace._trace.processor 0.278 ms (0.12%)
ddtrace._trace.sampler 0.189 ms (0.08%)
ddtrace._trace.span 0.189 ms (0.08%)
pprint 0.115 ms (0.05%)
ddtrace.internal.sampling 0.075 ms (0.03%)
ddtrace.internal.rate_limiter 0.075 ms (0.03%)
ddtrace.internal.dogstatsd 0.089 ms (0.04%)
ddtrace.vendor.dogstatsd 0.089 ms (0.04%)
ddtrace.vendor.dogstatsd.base 0.089 ms (0.04%)
queue 0.089 ms (0.04%)
heapq 0.089 ms (0.04%)
ddtrace._trace.tracer 0.088 ms (0.04%)
ddtrace._monkey 0.107 ms (0.05%)
ddtrace.appsec._listeners 0.107 ms (0.05%)
ddtrace.internal.core 0.107 ms (0.05%)
ddtrace.internal.core.event_hub 0.107 ms (0.05%)

The following import paths have shrunk:

ddtrace.auto 4.072 ms (1.71%)
ddtrace.bootstrap.sitecustomize 2.907 ms (1.22%)
ddtrace.bootstrap.preload 2.907 ms (1.22%)
ddtrace.internal.remoteconfig.client 1.112 ms (0.47%)
ddtrace.internal.runtime.runtime_metrics 0.737 ms (0.31%)
ddtrace.internal.runtime.metric_collectors 0.737 ms (0.31%)
ddtrace.internal.runtime.collector 0.715 ms (0.30%)
ddtrace.internal.products 0.243 ms (0.10%)
importlib.metadata 0.243 ms (0.10%)
importlib.metadata._meta 0.070 ms (0.03%)
importlib.abc 0.063 ms (0.03%)
importlib.resources 0.063 ms (0.03%)
importlib.resources._common 0.063 ms (0.03%)
ddtrace.settings.profiling 0.135 ms (0.06%)
ddtrace.internal.datadog.profiling.ddup 0.076 ms (0.03%)
ddtrace.internal.datadog.profiling.ddup._ddup 0.076 ms (0.03%)
ddtrace.vendor.psutil 0.058 ms (0.02%)
pwd 0.058 ms (0.02%)
ddtrace 1.164 ms (0.49%)
ddtrace.trace 0.370 ms (0.16%)
ddtrace._trace.filters 0.272 ms (0.11%)
ddtrace._trace.processor 0.272 ms (0.11%)
ddtrace._trace.sampler 0.167 ms (0.07%)
ddtrace._trace.span 0.167 ms (0.07%)
ddtrace.internal.sampling 0.095 ms (0.04%)
ddtrace._trace.sampling_rule 0.095 ms (0.04%)
ddtrace.internal.glob_matching 0.095 ms (0.04%)
ddtrace.internal.dogstatsd 0.104 ms (0.04%)
ddtrace.vendor.dogstatsd 0.104 ms (0.04%)
ddtrace.vendor.dogstatsd.base 0.104 ms (0.04%)
queue 0.104 ms (0.04%)
heapq 0.104 ms (0.04%)
_heapq 0.104 ms (0.04%)
ddtrace._trace.tracer 0.098 ms (0.04%)
ddtrace.internal.debug 0.098 ms (0.04%)
ddtrace._monkey 0.083 ms (0.04%)
ddtrace.appsec._listeners 0.083 ms (0.04%)
ddtrace.internal.core 0.083 ms (0.04%)
ddtrace._logger 0.033 ms (0.01%)
logging 0.033 ms (0.01%)
traceback 0.033 ms (0.01%)
contextlib 0.033 ms (0.01%)
ddtrace.internal._unpatched 0.024 ms (0.01%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant