[Agentic Search] Convert agentic query translator processor to system-generated processor #1568

owaiskazi19 · 2025-09-16T08:39:03Z

Description

Convert agentic query translator processor to system-generated processor

{
    "query": {
        "agentic": {
            "query_text": "List all species",
            "agent_id": "jf-WUZkBl9tG0YB4F8-A",
            "query_fields": ["species", "petal_length_in_cm"]
        }
    }
}

Related Issues

Part of #1525

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Owais <[email protected]>

src/main/java/org/opensearch/neuralsearch/processor/AgenticQueryTranslatorProcessor.java

src/main/java/org/opensearch/neuralsearch/query/AgenticSearchQueryBuilder.java

bzhangam · 2025-09-16T17:32:15Z

src/main/java/org/opensearch/neuralsearch/query/AgenticSearchQueryBuilder.java

+        }
        throw new IllegalStateException(
-            "Agentic search query must be used as top-level query, not nested inside other queries. Should be used with agentic_query_translator search processor"
+            "Agentic search query must be processed by the agentic_query_translator system processor before query execution. "


I think we can add a validation in the AgenticSearchQueryBuilder to ensure the processor is enabled in the cluster setting cluster.search.enabled_system_generated_factories.

I have added a new generic method NeuralSearchSettingsAccessor for all the system generated processor to use to verify if they are enabled in the factories

owaiskazi19 · 2025-09-16T18:21:09Z

Integ tests failing because of opensearch-project/ml-commons#4172

Signed-off-by: Owais <[email protected]>

bzhangam · 2025-09-16T22:12:23Z

src/main/java/org/opensearch/neuralsearch/settings/NeuralSearchSettingsAccessor.java

+     * @return true if the processor is enabled in cluster settings
+     */
+    public boolean isSystemGenerateProcessorEnabled(String processor) {
+        String enabledFactories = String.valueOf(clusterService.getClusterSettings().get(SYSTEM_GENERATED_PIPELINE_SETTINGS));


ENABLED_SYSTEM_GENERATED_FACTORIES_SETTING is a list of strings. And I think it's better to get the value through ENABLED_SYSTEM_GENERATED_FACTORIES_SETTING.get(clusterService.getSettings()) which will return a list of string to you.

And we also need to check if it has "*" which will enabled all the system processors.

owaiskazi19 · 2025-09-16T22:44:00Z

Based on the discussion with the team again, we have decided to go with the current approach of user-defined pipeline to avoid overloading the request with agent id. Will keep this PR open and will add another processor to this PR later. Keeping this on hold until then

heemin32 · 2025-09-16T23:35:01Z

Based on the discussion with the team again, we have decided to go with the current approach of user-defined pipeline to avoid overloading the request with agent id. Will keep this PR open and will add another processor to this PR later. Keeping this on hold until then

Could you tell more about the concern regarding agent id overloading?

owaiskazi19 · 2025-09-17T03:15:36Z

Could you tell more about the concern regarding agent id overloading?

For every search request, we need to pass an agent_id:

{
    "query": {
        "agentic": {
            "query_text": "List all species",
            "agent_id": "jf-WUZkBl9tG0YB4F8-A",
            "query_fields": ["species", "petal_length_in_cm"]
        }
    }
}

While we can define a pipeline as a one-time setup, we can have a system-generated pipeline if users want to use an agentic query for small purposes. However, overall, having a user-defined one-time pipeline would be a better choice here.

heemin32 · 2025-09-17T06:23:46Z

While we can define a pipeline as a one-time setup, we can have a system-generated pipeline if users want to use an agentic query for small purposes. However, overall, having a user-defined one-time pipeline would be a better choice here.

That’s debatable. If putting agent-id in the query isn’t a good idea, then the same could be said about query_field? Both can just be set in the pipeline. Personally, I feel passing agent-id in the query is simpler than creating and attaching a processor.
As the api is called by program but not by person, passing agent-id every time is not that bad imo.

owaiskazi19 · 2025-09-17T06:54:32Z

I feel passing agent-id in the query is simpler than creating and attaching a processor. As the api is called by program but not by person, passing agent-id every time is not that bad imo.

Agreed and we can provide both the options of system generated processor and user defined processor. Will add both the types in this same PR. Also, one other reason to keep it in a separate pipeline is we can have other processors like RAG or ml inference attached in the same pipeline for summarization of the result.

then the same could be said about query_field?

it's an optional field for better context.

owaiskazi19 requested review from VijayanB, bzhangam, heemin32, jmazanec15, junqiu-lei, martin-gaievski, minalsha, model-collapse, naveentatikonda, navneet1v, sean-zheng-amazon, vamshin, vibrantvarun, yuye-aws, zane-neo and zhichao-aws as code owners September 16, 2025 08:39

Convert agentic query translator processor to system-generated processor

4af6137

Signed-off-by: Owais <[email protected]>

owaiskazi19 force-pushed the system-search-pipeline branch from aaa9d44 to 4af6137 Compare September 16, 2025 08:42

Added test for NeuralSearch

9a2152f

Signed-off-by: Owais <[email protected]>

owaiskazi19 force-pushed the system-search-pipeline branch from 2b44284 to 9a2152f Compare September 16, 2025 08:55

owaiskazi19 closed this Sep 16, 2025

owaiskazi19 reopened this Sep 16, 2025

owaiskazi19 closed this Sep 16, 2025

owaiskazi19 reopened this Sep 16, 2025