Skip to content

Conversation

@inFocus7
Copy link
Contributor

@inFocus7 inFocus7 commented Aug 27, 2025

leftovers (passing to peter):

  • look into why the watch + reconciliation isn't happening aa2c5f1
    • it was working during this commit, which was right before making changes to simplify the CRD. maybe just need to re-spin up cluster, or forgot a change.
    • found out reason why i believed it wasn't working (an invalid url causes a reconciliation error on the remote agent, which means we don't kick off reconciliation for its callers. expected)
  • the ui remote agent creation is stating it's not a valid type. started occuring after crd simplication, so maybe missed something there.
  • Eitan's comment regarding managing a Task manually on our OnSendMessageStream, this way we 100% have a finalized Task to store. A stream could end with StatusUpdate final:true, without a Task object, meaning we'd have to have built it during on our end during stream.

Context

Adding support for a2a with agents hosted on a remote server.

My goal is for this work is to encompass the simple bare necessities 🐻 of remote agent support, which is usable (although requires manual reconciliation for agent card updates). Then immediately work on follow-ups (polling).

Changes

TODO: Update description based on new simpler CRD (discoveryUrl)

  • Adds a Remote type for Agents. This remote agent type allows for a2a communications with agents served elsewhere.
    • There are two fields, the agent card url (required) and the server url (optional, to override the agent card's url for a2a).
    • Status is only based on reconciliation.
  • Adds remote agent creation + editing functionality in UI.
    • When working on this, i noticed we check for .error field to catch server errors on creation, but our error wrapper only set a message, so I ensured it also set error.
  • Storing new details in the database
    • remoteConfig: similar to existing config, but for remote agents. reused existing remote config to store the main remote agent information.
    • agentCard: only using for remote agents. used for UI/displaying purposes if a user wants to see the agent card details.
      • I stored this to allow users to preview the agent card. it makes most sense for them to view it based on the latest fetch stored versus dynamically fetching with agent url whenever they want to preview. what they see (agent card) should be what they have (latest fetched state).
      • Open to other ideas (i'm listing a few alternatives below)
        1. storing only agentCard instead of remoteConfig in the db -- they hold similar information (name, description, url), the agent card just holds more.
        2. storing agent card data on the agent status instead of the db -- this is similar to something done in gloo portal where it stores the api discovery information on the status of some(?) custom resource def.
        3. not allowing agentCard preview, so not storing the fetched data -- good for saving storage, not ideal UX.
        4. New custom remoteConfig which includes agent card as []byte data. I think this makes most sense for simplicity's sake.

TODO/Unknown

  • session/task storage
    • Eitan brought up about a different approach for remote agent storage. I'll follow up on that during the review cycle, or at least after resolving most of these todos.
    • Currently, I opted to hoping(?) the remote agent themselves are implemented with their own task storage/setup, and we'd simply be storing the tasks (with remote agent-provided information) on the database for future fetching.
    • We created a new manager acting as a middleware for remote agent communication. This would handle storing tasks/sessions to our storage.
    • note: i need to implement a new a2a server with task handling, it looks like an update to https://github.com/kagent-dev/a2a-go is required - as this is where we handle our a2a server logic. unless we want this stateful_a2a logic to live here, but this may be a bit weird to split. we wouldn't make updates to the upstream repo our a2a-go is forked from since these changes are kagent-specific. crossing off for now while looking into this to avoid a net-new service + cross-repo changes. maybe by creating a separate task manager for remote services, this can be handled? else, we'll need to expand on the a2a-go to implement a net-new a2a server.

Follow-Ups

This is a list of work I believe would work best as a follow-up to this. This would be in order to keep this PR at a reasonable size (diffs). I'm planning on implementing this/these after this gets merged, or is close-to-merge.

  • agent card re-fetch poll
    • Question: Should polling be configured on a per-remote agent basis? Or a global polling that batch polls/updates all remote agents?
  • allow for remote agent choosing in tools & agents selection.
    • currently unsupported. this would be implemented after the polling implementation to better understand how the information fetching + updating would work through agent-as-a-subagent.
    • I tried a setup locally which half worked, which did the following:
      1. updated the manifests (secrets + deployment hash) for declarative agents using remote agents as tools, so they query the db for information.
      2. added a watcher so when remote agents reconcile, we also do the same for agents calling them as tools (to ensure they get updated configs)
    • An issue with the above is how we setup agents as tools in our kagent/adk code. When we create the agent as tool, we assume that the url (server) is hosted in the same path as the agent card. This is a bad assumption for remote agents, as they can differ. Another thing it does is then use the server url configured on the agent card. This is also a bad assumption, as it would use the original server's agent card which would not hold the server override (if set). We would want to expose our own agent card handler that displays the agent card/config stored in the db (which holds the overriden url).
  • auth for secured remote agents(?)

Updates

  • i'll ask if we can implement polling in a fast-follow (well, as fast as i can implement). There's a decent amount of additions in this PR for its initial support. It would be easier to get this reviewed and merged, then implement polling as a separate PR imo.
    • if i do this polling in a separate PR, i'll also look into the remote-agent-as-called-agent implementation since polling would affect it.

resolves: #820

@inFocus7 inFocus7 force-pushed the feat/remote-agent-type branch from 168d68c to 74cf718 Compare August 28, 2025 14:48
@inFocus7
Copy link
Contributor Author

inFocus7 commented Sep 10, 2025

holy moly, there's a lot of merge conflicts. not sure when/if picking this up again, but note to self: it may be easier to begin a new branch and manually pick over changes.

update: resolved merge conflicts locally. unsure if it's all still working. need to test it again + do a self code review to make sure I didn't remove any work from main during merging.

need to find my reproduction setup again :rip:

@inFocus7 inFocus7 marked this pull request as ready for review September 16, 2025 16:14
@inFocus7 inFocus7 requested a review from EItanya as a code owner September 16, 2025 16:14
Copilot AI review requested due to automatic review settings September 16, 2025 16:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements support for remote agents in the kagent system, allowing for A2A (Agent-to-Agent) communication with agents hosted on external servers. The implementation includes the creation of a new "Remote" agent type that uses agent cards to discover and communicate with external agents.

Key changes include:

  • Added "Remote" agent type with agent card URL and optional server URL configuration
  • Implemented UI components for creating, editing, and previewing remote agents
  • Created middleware for recording remote agent interactions in the database
  • Extended the A2A protocol handling to support remote agent task management

Reviewed Changes

Copilot reviewed 35 out of 37 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ui/src/types/index.ts Added Remote agent type and RemoteAgentSpec interface
ui/src/lib/messageHandlers.ts Enhanced token usage extraction for remote agents and improved artifact streaming
ui/src/components/sidebars/AgentDetailsSidebar.tsx Updated to handle remote agents without model display
ui/src/components/create/SelectToolsDialog.tsx Added null safety for search terms and descriptions
ui/src/components/AgentsProvider.tsx Added remote agent validation and form data types
ui/src/components/AgentCardPreview.tsx New component for previewing remote agent cards
ui/src/components/AgentCard.tsx Added preview functionality for remote agents
ui/src/app/agents/new/page.tsx Extended agent creation form with remote agent fields
ui/src/app/actions/utils.ts Fixed error response to include error field
ui/src/app/actions/agents.ts Added getAgentCard API call and remote agent form handling
go/api/v1alpha2/agent_types.go Added Remote agent type and RemoteAgentSpec to CRD
go/internal/controller/translator/adk_api_translator.go Added remote agent card fetching and translation
go/internal/controller/reconciler/reconciler.go Enhanced reconciler to handle remote agents and store agent cards
go/internal/a2a/recording_manager.go New recording manager for remote agent task/session storage
go/internal/a2a/a2a_handler_mux.go Updated A2A handler to use recording manager for remote agents
go/internal/httpserver/handlers/agents.go Added agent card retrieval endpoint
go/internal/database/models.go Extended Agent model with remote config and agent card storage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Fabian Gonzalez <[email protected]>
Copy link
Contributor

@EItanya EItanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall design makes sense, but there are some details that need to be worked out before we can merge this.

Comment on lines 82 to 84
if err := m.dbClient.StoreTask(task); err != nil {
logger.Error(err, "Failed to store sync task", "taskID", task.ID, "contextID", task.ContextID)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we failing silently here? The whole point of this functionality is to record this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I did a more explicit get -> if not exist, create for the Session.

If a session did not exist (or get created successfully), then we do not store a task. This is assuming a session -> tasks relation. If we allow for Tasks to exists without a Session, I could change this to always create a Task.


// Marshal remote agent's AgentCard to store in DB
var serializedCard string
if agent.Spec.Type == v1alpha2.AgentType_Remote && agentOutputs != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a remote agent have a card stored from the translation? If we're going to store the agent card we should do it in a separate controller from this one since there's this required async logic of the card.

Description: toolAgent.Spec.Description,
})
case v1alpha2.AgentType_Remote:
/* TODO: Add support for remote agents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The translator is a pure function, if we are going to make a network call we should do it in the reconciler

Copy link
Contributor Author

@inFocus7 inFocus7 Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Just need to figure out how the polling of agent card ties into this a bit better. I was/am planning on handling polling logic in a follow-up to avoid blowing up this PR with diffs.

Self-thought/note:
I think we do something along these lines for MCP tools, where there's a watcher that updates agents when mcp tools update. So something like that would be done for remote agent-as-a-tool, where updates would occur when they get re-polled (or when the remote agent reconciles).

Additionally, doing something like storing the agent card hashed in the remote agent annotation could kick off its reconciliation, which would should then update the usages of it as a tool from agents watching it.


Would probably be worth huddling (unless this is an easy yes/no). For using it as a tool we can:

  1. Add a watcher, so when remote agents reconcile, agents using them as a tool do so as well
  2. During reconciliation, remote agent urls/configs are updated based on latest data (from the database)
    This way it stays up to date.

When we implement polling, we'll need to see how to handle it (in my opinion adding an annotation on remote agents that is a hash of the config should be good. this way it forces a reconciliation + caller agents update). I'm trying this locally. It works in theory, but getting a 503 adk error when the caller agent tries calling the remote agent 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a whole separate agent for this? Can't we just treat it as remote in the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to play around with this idea a bit more.

Initially I was against it because it would mean we need the BYO kebab Agent to be created first, so that its deployment + service exists and the Remote Agent can reference the service URL. But should be simple to deal with.

Now, after trying it out: the remote agent test passes, but when I try to chat with the remote agent...

Response in chat:

Client error '403 Forbidden' for url 'http://kagent-controller.kagent:8083/api/sessions/ctx-a6f000bf-7e85-4465-ab06-f08dd9be04d7/[email protected]' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403

Kebab agent's deployment:

INFO:     10.244.0.51:43074 - "POST / HTTP/1.1" 200 OK
ERROR:google_adk.kagent.adk._agent_executor:Error handling A2A request: Client error '403 Forbidden' for url 'http://kagent-controller.kagent:8083/api/sessions/ctx-a6f000bf-7e85-4465-ab06-f08dd9be04d7/[email protected]'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
Traceback (most recent call last):
  File "/.kagent/python/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 125, in execute
    await self._handle_request(context, event_queue, runner)
  File "/.kagent/python/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 206, in _handle_request
    async for adk_event in runner.run_async(**run_args):
    ...<4 lines>...
            await event_queue.enqueue_event(a2a_event)
  File "/.kagent/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 250, in run_async
    async for event in agen:
      yield event
  File "/.kagent/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 228, in _run_with_trace
    await self._append_new_message_to_session(
    ...<5 lines>...
    )
  File "/.kagent/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 357, in _append_new_message_to_session
    await self.session_service.append_event(session=session, event=event)
  File "/.kagent/python/packages/kagent-adk/src/kagent/adk/_session_service.py", line 168, in append_event
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/.kagent/python/.venv/lib/python3.13/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '403 Forbidden' for url 'http://kagent-controller.kagent:8083/api/sessions/ctx-a6f000bf-7e85-4465-ab06-f08dd9be04d7/[email protected]'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
INFO:     10.244.0.1:51740 - "GET /health HTTP/1.1" 200 OK
INFO:     10.244.0.1:46628 - "GET /health HTTP/1.1" 200 OK

I'll need to dig into this, but it's probable that it's due to the remote agent a2a storing information in the DB and the other agent doing so as well (since byo + declarative agents also store in the db themselves). I'm not even sure if this would be a "real world" issue, since a user likely wouldn't be defining an agent created by kagent as a remote agent. (unless there's valid cross-cluster use cases for this, but in this case, it would be good to config remote agent with a bool field to not store in the db)

Copy link
Contributor Author

@inFocus7 inFocus7 Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so while the test (somehow) passes, the session storage is a big issue if the remote agent writes to the database as well.

kagent >> get session
+---+------------------------------------------+--------------------+--------------------------------+----------------------+
| # | ID                                       | NAME               | AGENT                          | CREATED              |
+---+------------------------------------------+--------------------+--------------------------------+----------------------+
| 1 | ctx-5b33e4b2-d2d6-4854-955c-0f6dea1a282f | remote-kebab-agent | kagent__NS__remote_kebab_agent | 2025-09-17T22:54:21Z |
+---+------------------------------------------+--------------------+--------------------------------+----------------------+

remote agent "owns" this session id, so when the agent it communicates to tries accessing/writing it, it fails. which makes sense.

So what happens is that

  1. The remote Agent handler creates a Session (as expected)
  2. It communicates with the other Agent
  3. The other agent was created by us, meaning it has DB access
  4. The other agent tries creating/getting the Session
  5. The other agent fails and returns an error, because it already exists and is owned by the Remote Agent.

If we want to allow the case where a remote agent is one which already writes its sessions to the DB, then we'll want to add a spec field to disable the database writing -- or do something else where it passes a different context/id to the remote agent. The main issue here would be figuring out how to handle their own sessions/display.

[...]
    type: Remote
    remote:
      [urls...]
      persistData: false # or persistTask/Session, persist, store, etc..

inFocus7 and others added 7 commits September 17, 2025 18:14
Signed-off-by: Fabian Gonzalez <[email protected]>
Signed-off-by: Peter Jausovec <[email protected]>
* main:
  Fix UI/streaming timeouts for long running LLM requests (kagent-dev#907)
  fix helm value for env (kagent-dev#910)
  feat: allow per-agent header configuration for tools (kagent-dev#884)
  feat: Set system message from ConfigMap or Secrets (kagent-dev#894)

Signed-off-by: Peter Jausovec <[email protected]>
@peterj peterj force-pushed the feat/remote-agent-type branch from 51d1d2e to 15801b4 Compare September 18, 2025 23:39
Signed-off-by: Peter Jausovec <[email protected]>
peterj and others added 2 commits September 19, 2025 13:58
Comment on lines +523 to +529
case v1alpha2.AgentType_Remote:
cfg.RemoteAgents = append(cfg.RemoteAgents, adk.RemoteAgentConfig{
Name: utils.ConvertToPythonIdentifier(utils.GetObjectRef(toolAgent)),
Url: agent.Spec.Remote.DiscoveryURL,
Headers: headers,
Description: toolAgent.Spec.Description,
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Since we're not using Watches, does this mean that if a remote agent's discovery URL is updated, any Agent referencing it as a tool won't get it until they manually reconcile?

(iiirc, the deployment for declarative agents have their agent-as-tools urls "baked in" their secrets, which update on reconciliation.)

@ae12345678910
Copy link

When will this be merged into kagent? This is a feature that I would find very valuable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Remote Agent Type

4 participants