Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Jan-Kazlouski-elastic · 2025-10-15T13:56:43Z

Creation of new OpenShift AI inference provider integration allowing

text_embedding,
completion (both streaming and non-streaming),
chat_completion (only streaming)
rerank

tasks to be executed as part of inference API with openshiftai provider.

Changes were tested locally against next models:

gritlm-7b (text_embedding)
llama-31-8b-instruct (completion and chat_completion)
bge-reranker-v2-m3 (rerank)

Test results:

EMBEDDINGS

Create Embeddings Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-embeddings-url}}",
        "api_key": "{{openshift-ai-embeddings-token}}",
        "model_id": "gritlm-7b"
    }
}
RS
{
    "inference_id": "openshift-ai-text-embedding",
    "task_type": "text_embedding",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "gritlm-7b",
        "url": "{{openshift-ai-embeddings-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        },
        "dimensions": 4096,
        "similarity": "dot_product",
        "dimensions_set_by_user": false
    },
    "chunking_settings": {
        "strategy": "sentence",
        "max_chunk_size": 250,
        "sentence_overlap": 1
    }
}

Perform Embeddings

RQ
{
    "input": [
        "The sky above the port was the color of television tuned to a dead channel.",
        "The sky above the port was the color of television tuned to a dead channel."
    ]
}
RS
{
    "text_embedding": [
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        },
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        }
    ]
}

COMPLETION

Create Completion Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-completion",
    "task_type": "completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}

Perform Non-Streaming Completion

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "That's a famous opening line from George Orwell's novel \"1984\". The full quote is:\n\n\"He gazed up at the grey sky, which was like the colour of television tuned to a dead channel.\"\n\nIn the novel, the sky is a perpetual grey, which is a metaphor for the bleak and oppressive atmosphere of the totalitarian society that Orwell describes. The comparison to a dead TV channel is also significant, as it suggests a lack of signal, a lack of information, and a lack of life.\n\nOrwell wrote \"1984\" in 1948-49, as a warning about the dangers of totalitarianism and the erosion of individual freedom. The novel has become a classic of dystopian literature and a powerful commentary on the human condition."
        }
    ]
}

Perform Streaming Completion

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"The"},{"delta":" quote"}]}

event: message
data: {"completion":[{"delta":" \""},{"delta":"The"}]}

event: message
data: {"completion":[{"delta":" sky"},{"delta":" above"}]}

event: message
data: [DONE]

CHAT COMPLETION

Create Chat Completion Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-chat-completion",
    "task_type": "chat_completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}

Perform Basic Chat Completion

RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 10
}
RS
event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"**"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"Deep"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":40,"total_tokens":50}}

event: message
data: [DONE]

Perform Tool Call Chat Completion

RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's the price of a scarf?"
                }
            ]
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_price",
                "description": "Get the current price of a item",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "item": {
                            "id": "123"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": {
        "type": "function",
        "function": {
            "name": "get_current_price"
        }
    }
}
RS
event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"tool_calls":[{"index":0,"id":"chatcmpl-tool-e425f3a8f702434a80d3896bbe5cb36c","function":{"arguments":"{\"","name":"get_current_price"},"type":"function"}]},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":172,"total_tokens":182}}

event: message
data: [DONE]

RERANK

Create Rerank Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-rerank-url}}",
        "api_key": "{{openshift-ai-rerank-token}}",
        "model_id": "bge-reranker-v2-m3"
    }
}
RS
{
    "inference_id": "openshift-ai-rerank",
    "task_type": "rerank",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "bge-reranker-v2-m3",
        "url": "{{openshift-ai-rerank-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}

Perform Rerank

RQ
{
    "input": [
        "luke",
        "like",
        "leia",
        "chewy",
        "r2d2",
        "star",
        "wars"
    ],
    "query": "star wars main character",
    "top_n": 2
}
RS
{
    "rerank": [
        {
            "index": 0,
            "relevance_score": 0.28466797,
            "text": "luke"
        },
        {
            "index": 3,
            "relevance_score": 0.23522949,
            "text": "chewy"
        }
    ]
}

- Have you signed the contributor license agreement?
- Have you followed the contributor guidelines?
- If submitting code, have you built your formula locally prior to submission with gradle check?
- If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
- If submitting code, have you checked that your submission is for an OS and architecture that we support?
- If you are submitting this code for a class then read our policy for that.

…nd reranking

… names and add changelog

…andling

…t AI chat completion

…ogic and update dimensionsSetByUser handling

…ax tokens and add unit tests for request creation and validation

Implement OpenShift AI integration for chat completion, embeddings, a…

3c67123

…nd reranking

elasticsearchmachine added v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 15, 2025

Jan-Kazlouski-elastic and others added 12 commits October 15, 2025 17:13

Refactor OpenShift AI service settings to use underscores in constant…

fdb22ff

… names and add changelog

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

8ce569e

Add constructor to OpenShiftAiChatCompletionServiceSettings for URL h…

b268e08

…andling

Add unit tests

9cae6b1

[CI] Auto commit changes from spotless

f804331

Add tests for UnifiedCompletionRequest model ID overrides in OpenShif…

af2fcd6

…t AI chat completion

Add unit tests for OpenShiftAiChatCompletionResponseHandler

b98d8d6

Add unit tests for OpenShiftAiChatCompletionServiceSettings

b19342f

Update request type description in OpenShiftAiCompletionResponseHandler

6af168c

Refactor OpenShiftAiEmbeddingsServiceSettings to improve validation l…

aadbfde

…ogic and update dimensionsSetByUser handling

Update OpenShiftAiChatCompletionRequestEntity to use new method for m…

fc5c182

…ax tokens and add unit tests for request creation and validation

Add unit tests for OpenShiftAiChatCompletionRequestEntity serialization

d664644

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Jan-Kazlouski-elastic commented Oct 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Are you sure you want to change the base?

Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Conversation

Jan-Kazlouski-elastic commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jan-Kazlouski-elastic commented Oct 15, 2025 •

edited

Loading