-
Notifications
You must be signed in to change notification settings - Fork 145
feat: add embedding hiding configuration and align spec with instrumentation #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I will pull out of draft once I unbreak the build |
7c023fa
to
5a25a52
Compare
5a25a52
to
8cfa839
Compare
@mikeldking this should be ready to review because the code that does embedding is up to date. There are a couple red packages unrelated to this change and need some work to get green. Agno: It essentially doesn't function because it's only partially ported to version 2.0. We need to bump the minimum version requirement to 2.0, as 1.5 is no longer compatible. Additionally, the tests aren't correct because they are only catching the LLM spans not the http requests made by the agent. This results in inconsistent runs.If you'd like, I can share my partially completed branch, but I won't be able to dedicate time to iterating on it further. smolagents : Fixing the mypy errors in smolagents might require less effort overall. |
Signed-off-by: Adrian Cole <[email protected]>
here's the branch but I can't complete but got pretty far https://github.com/Arize-ai/openinference/compare/main...codefromthecrypt:openinference:refactor-agno-tests?expand=1 |
Signed-off-by: Adrian Cole <[email protected]>
added the smolagents fix. crewai is new but I can't reproduce that locally. agno is the biggie, but not related to this change either |
"pytest-recording", | ||
"openai", | ||
"ddgs", | ||
"duckduckgo-search", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this to be ddgs as it got renamed for agno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool.
FYI I plan to break apart this PR into pieces, to make it easier to review. so any comments here I'll carry over to the partitioned PRs and mark this draft until only the spec/substrate remains. |
part 1: openai #2210 once in I'll do the same changes for litellm and pull both off this PR |
This PR aligns litellm with the specification changes around embeddings in Arize-ai#2162 **Spec changes from 2162** - Consistent span name: `"CreateEmbeddings"` - Standardized attribute structure: `embedding.embeddings.N.embedding.{text|vector}` - Unified invocation parameter tracking: `embedding.invocation_parameters` - Proper `llm.system` attribute for provider identification **Code improvements:** - Full batch embedding support with indexed attributes - Separated invocation parameters from input data - Improved handling of token IDs vs text inputs - Vectors stored as tuples instead of JSON strings This is the same as Arize-ai#2210, except litellm. Signed-off-by: Adrian Cole <[email protected]>
sorry about delay, crushed.. here's the next to pull off this: litellm #2238 |
Impact
This PR enhances data privacy for embedding operations and brings consistency across all embedding instrumentation providers, including BeeAI and Haystack.
Key Features
🔒 Privacy Controls for Embeddings
OPENINFERENCE_HIDE_EMBEDDINGS_VECTORS
: Redacts embedding vectors with"__REDACTED__"
OPENINFERENCE_HIDE_EMBEDDINGS_TEXT
: Redacts embedding text content🎯 Standardized Embedding Instrumentation
All providers now use:
"CreateEmbeddings"
embedding.embeddings.N.embedding.{text|vector}
embedding.invocation_parameters
llm.system
attribute for provider identificationProvider-specific improvements:
BeeAI:
embedding.invocation_parameters
extraction from input eventsllm.system: "beeai"
identificationHaystack:
llm.system
based on component class nameOpenAI/LiteLLM:
📋 Specification Alignment