feat: add embedding hiding configuration and align spec with instrumentation #2162

codefromthecrypt · 2025-09-02T01:53:32Z

Impact

This PR enhances data privacy for embedding operations and brings consistency across all embedding instrumentation providers, including BeeAI and Haystack.

Key Features

🔒 Privacy Controls for Embeddings

New configuration options to protect sensitive embedding data:
- OPENINFERENCE_HIDE_EMBEDDINGS_VECTORS: Redacts embedding vectors with "__REDACTED__"
- OPENINFERENCE_HIDE_EMBEDDINGS_TEXT: Redacts embedding text content
Consistent redaction behavior across all instrumentation packages

🎯 Standardized Embedding Instrumentation

All providers now use:

Consistent span name: "CreateEmbeddings"
Standardized attribute structure: embedding.embeddings.N.embedding.{text|vector}
Unified invocation parameter tracking: embedding.invocation_parameters
Proper llm.system attribute for provider identification

Provider-specific improvements:

BeeAI:

Added embedding.invocation_parameters extraction from input events
Ensured embedding vectors are always lists (not tuples)
Added proper llm.system: "beeai" identification
Removed unnecessary child spans for cleaner traces

Haystack:

Added dynamic llm.system based on component class name
Improved embedding vector decoding (supports base64 and raw arrays)
Added invocation parameter extraction (excludes input data)
Enhanced numpy array compatibility

OpenAI/LiteLLM:

Full batch embedding support with indexed attributes
Separated invocation parameters from input data
Improved handling of token IDs vs text inputs

📋 Specification Alignment

Updated spec to clarify when text attributes are populated (only for string inputs, not token IDs)
Documented why token IDs aren't decoded (cross-provider incompatibility, runtime constraints)
Clarified redaction behavior for each configuration option

codefromthecrypt · 2025-09-02T01:59:06Z

I will pull out of draft once I unbreak the build

…ntation

codefromthecrypt · 2025-09-14T06:57:26Z

@mikeldking this should be ready to review because the code that does embedding is up to date. There are a couple red packages unrelated to this change and need some work to get green.

Agno: It essentially doesn't function because it's only partially ported to version 2.0. We need to bump the minimum version requirement to 2.0, as 1.5 is no longer compatible. Additionally, the tests aren't correct because they are only catching the LLM spans not the http requests made by the agent. This results in inconsistent runs.If you'd like, I can share my partially completed branch, but I won't be able to dedicate time to iterating on it further.

smolagents : Fixing the mypy errors in smolagents might require less effort overall.

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt · 2025-09-14T07:04:43Z

here's the branch but I can't complete but got pretty far https://github.com/Arize-ai/openinference/compare/main...codefromthecrypt:openinference:refactor-agno-tests?expand=1

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt · 2025-09-14T08:29:21Z

added the smolagents fix. crewai is new but I can't reproduce that locally. agno is the biggie, but not related to this change either

mikeldking · 2025-09-18T00:03:05Z

python/instrumentation/openinference-instrumentation-agno/pyproject.toml

  "pytest-recording",
  "openai",
-  "ddgs",
+  "duckduckgo-search",


we need this to be ddgs as it got renamed for agno

codefromthecrypt · 2025-09-18T01:21:13Z

FYI I plan to break apart this PR into pieces, to make it easier to review. so any comments here I'll carry over to the partitioned PRs and mark this draft until only the spec/substrate remains.

codefromthecrypt · 2025-09-18T01:22:31Z

part 1: openai #2210

once in I'll do the same changes for litellm and pull both off this PR

This PR aligns litellm with the specification changes around embeddings in Arize-ai#2162 **Spec changes from 2162** - Consistent span name: `"CreateEmbeddings"` - Standardized attribute structure: `embedding.embeddings.N.embedding.{text|vector}` - Unified invocation parameter tracking: `embedding.invocation_parameters` - Proper `llm.system` attribute for provider identification **Code improvements:** - Full batch embedding support with indexed attributes - Separated invocation parameters from input data - Improved handling of token IDs vs text inputs - Vectors stored as tuples instead of JSON strings This is the same as Arize-ai#2210, except litellm. Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt · 2025-09-28T01:53:46Z

sorry about delay, crushed.. here's the next to pull off this: litellm #2238

codefromthecrypt requested a review from a team as a code owner September 2, 2025 01:53

github-project-automation bot added this to Instrumentation Sep 2, 2025

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 2, 2025

codefromthecrypt marked this pull request as draft September 2, 2025 01:55

This comment was marked as outdated.

Sign in to view

codefromthecrypt mentioned this pull request Sep 2, 2025

add tracing support for the embeddings router filter envoyproxy/ai-gateway#1085

Closed

codefromthecrypt force-pushed the spec-embeddings branch 4 times, most recently from 7c023fa to 5a25a52 Compare September 10, 2025 01:34

feat: add embedding hiding configuration and align spec with instrume…

8cfa839

…ntation

codefromthecrypt force-pushed the spec-embeddings branch from 5a25a52 to 8cfa839 Compare September 14, 2025 06:37

codefromthecrypt marked this pull request as ready for review September 14, 2025 06:54

fix the easy fuzz

c26bfc2

Signed-off-by: Adrian Cole <[email protected]>

This comment was marked as outdated.

Sign in to view

store_as_tuple

49b6f49

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt mentioned this pull request Sep 17, 2025

feat(openai): align embedding instrumentation with pending spec #2210

Merged

mikeldking reviewed Sep 18, 2025

View reviewed changes

codefromthecrypt marked this pull request as draft September 18, 2025 01:21

codefromthecrypt mentioned this pull request Sep 28, 2025

feat(litellm): align embedding instrumentation with pending spec #2238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add embedding hiding configuration and align spec with instrumentation #2162

feat: add embedding hiding configuration and align spec with instrumentation #2162

Uh oh!

codefromthecrypt commented Sep 2, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

codefromthecrypt commented Sep 2, 2025

Uh oh!

codefromthecrypt commented Sep 14, 2025 •

edited

Loading

Uh oh!

codefromthecrypt commented Sep 14, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

codefromthecrypt commented Sep 14, 2025

Uh oh!

mikeldking Sep 18, 2025

Uh oh!

codefromthecrypt Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add embedding hiding configuration and align spec with instrumentation #2162

Are you sure you want to change the base?

feat: add embedding hiding configuration and align spec with instrumentation #2162

Uh oh!

Conversation

codefromthecrypt commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Impact

Key Features

🔒 Privacy Controls for Embeddings

🎯 Standardized Embedding Instrumentation

📋 Specification Alignment

Uh oh!

This comment was marked as outdated.

Uh oh!

codefromthecrypt commented Sep 2, 2025

Uh oh!

codefromthecrypt commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codefromthecrypt commented Sep 14, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

codefromthecrypt commented Sep 14, 2025

Uh oh!

mikeldking Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

codefromthecrypt Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codefromthecrypt commented Sep 2, 2025 •

edited

Loading

codefromthecrypt commented Sep 14, 2025 •

edited

Loading