Skip to content

Conversation

Jan-Kazlouski-elastic
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented Oct 15, 2025

Creation of new OpenShift AI inference provider integration allowing

  • text_embedding,
  • completion (both streaming and non-streaming),
  • chat_completion (only streaming)
  • rerank

tasks to be executed as part of inference API with openshiftai provider.

Changes were tested locally against next models:

  • gritlm-7b (text_embedding)
  • llama-31-8b-instruct (completion and chat_completion)
  • bge-reranker-v2-m3 (rerank)

Test results:

EMBEDDINGS
Create Embeddings Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-embeddings-url}}",
        "api_key": "{{openshift-ai-embeddings-token}}",
        "model_id": "gritlm-7b"
    }
}
RS
{
    "inference_id": "openshift-ai-text-embedding",
    "task_type": "text_embedding",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "gritlm-7b",
        "url": "{{openshift-ai-embeddings-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        },
        "dimensions": 4096,
        "similarity": "dot_product",
        "dimensions_set_by_user": false
    },
    "chunking_settings": {
        "strategy": "sentence",
        "max_chunk_size": 250,
        "sentence_overlap": 1
    }
}
Perform Embeddings
RQ
{
    "input": [
        "The sky above the port was the color of television tuned to a dead channel.",
        "The sky above the port was the color of television tuned to a dead channel."
    ]
}
RS
{
    "text_embedding": [
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        },
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        }
    ]
}
COMPLETION
Create Completion Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-completion",
    "task_type": "completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}
Perform Non-Streaming Completion
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "That's a famous opening line from George Orwell's novel \"1984\". The full quote is:\n\n\"He gazed up at the grey sky, which was like the colour of television tuned to a dead channel.\"\n\nIn the novel, the sky is a perpetual grey, which is a metaphor for the bleak and oppressive atmosphere of the totalitarian society that Orwell describes. The comparison to a dead TV channel is also significant, as it suggests a lack of signal, a lack of information, and a lack of life.\n\nOrwell wrote \"1984\" in 1948-49, as a warning about the dangers of totalitarianism and the erosion of individual freedom. The novel has become a classic of dystopian literature and a powerful commentary on the human condition."
        }
    ]
}
Perform Streaming Completion
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"The"},{"delta":" quote"}]}

event: message
data: {"completion":[{"delta":" \""},{"delta":"The"}]}

event: message
data: {"completion":[{"delta":" sky"},{"delta":" above"}]}

event: message
data: [DONE]
CHAT COMPLETION
Create Chat Completion Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-chat-completion",
    "task_type": "chat_completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}
Perform Basic Chat Completion
RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 10
}
RS
event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"**"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"Deep"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":40,"total_tokens":50}}

event: message
data: [DONE]
Perform Tool Call Chat Completion
RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's the price of a scarf?"
                }
            ]
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_price",
                "description": "Get the current price of a item",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "item": {
                            "id": "123"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": {
        "type": "function",
        "function": {
            "name": "get_current_price"
        }
    }
}
RS
event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"tool_calls":[{"index":0,"id":"chatcmpl-tool-e425f3a8f702434a80d3896bbe5cb36c","function":{"arguments":"{\"","name":"get_current_price"},"type":"function"}]},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":172,"total_tokens":182}}

event: message
data: [DONE]
RERANK
Create Rerank Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-rerank-url}}",
        "api_key": "{{openshift-ai-rerank-token}}",
        "model_id": "bge-reranker-v2-m3"
    }
}
RS
{
    "inference_id": "openshift-ai-rerank",
    "task_type": "rerank",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "bge-reranker-v2-m3",
        "url": "{{openshift-ai-rerank-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}
Perform Rerank
RQ
{
    "input": [
        "luke",
        "like",
        "leia",
        "chewy",
        "r2d2",
        "star",
        "wars"
    ],
    "query": "star wars main character",
    "top_n": 2
}
RS
{
    "rerank": [
        {
            "index": 0,
            "relevance_score": 0.28466797,
            "text": "luke"
        },
        {
            "index": 3,
            "relevance_score": 0.23522949,
            "text": "chewy"
        }
    ]
}
  • - Have you signed the contributor license agreement?
  • - Have you followed the contributor guidelines?
  • - If submitting code, have you built your formula locally prior to submission with gradle check?
  • - If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • - If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • - If you are submitting this code for a class then read our policy for that.

@elasticsearchmachine elasticsearchmachine added v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants