generated from cloudwego/.github
-
Notifications
You must be signed in to change notification settings - Fork 595
Refactor/graph resume #465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
shentongmartin
wants to merge
16
commits into
main
Choose a base branch
from
refactor/graph_resume
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+2,130
−267
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
📊 Coverage Report:
|
c26f251
to
8882e60
Compare
d264744
to
838550a
Compare
Change-Id: I30b5fe3a8122b86fdd2818bf66dec18bc155e33f
…ions Change-Id: I2e6357acec95433e648fbdc8509ef44e6eb22dce
Change-Id: I67d05742e079b1fe7ee11dbadec9ac5c4c3ffd8c
This change exports the function to , making it a public part of the API. This is essential for developers who want to build their own custom composite nodes. To correctly implement the and interfaces, a custom node needs the ability to create distinct execution contexts for its internal, interruptible sub-components. Exporting this function provides the necessary hook for that advanced use case. All internal call sites within the package have been updated to use the new exported function name. Change-Id: Ibbc651131333860064454f691b346ab87149824c
Change-Id: I54b60cd3ee79e20e92ec1fe1b2e870f18763f5b6
Change-Id: Ifad728c6c6ea59cb21b2e622ebad51f249b5fc91
Change-Id: Ia7f0b548fae95f610e4e59333c9ec9e60e13b827
…f clear it Change-Id: I9ab6290573b1faf6bda39a282e7986df61f2307d
…ass in PathStep Change-Id: I43c40b77ba74c6408dd3e930ea330bf4099e4f24
…and GetResumeContext Change-Id: Ia97f4245db7edce7e784443cb36c14cc211043ce
Change-Id: Icd7dc35e924b1a9b58a933d6f5a7aaf0ab51c57a
Change-Id: Ifae23e99c2392a4e1f1e938cb725e76c5a71f886
Change-Id: I689c9ef7799020f0e988fd724620641efb71ef80
Change-Id: I5cf93f1a00ea0aaa07cde62ac5d0749a4a3912b1
838550a
to
20a24e1
Compare
Change-Id: If26cc015084802d7509ef9a8a46f17902a374e2a
20a24e1
to
d3f1060
Compare
Change-Id: I3d845adc10698f67443d11543e6cd6269dc69877
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feat: Add Robust Graph Interrupt and Resume Mechanism
Summary & Motivation
This pull request introduces a robust and developer-friendly mechanism for interrupting and resuming graph execution. The previous system had several limitations that made handling complex interruptions difficult, including a lack of local state persistence, no deep addressing for nested components, no way to provide targeted resume data, and an ambiguous execution context for components.
This new mechanism is a ground-up redesign that addresses all of these issues by introducing a stable path-based addressing system, a clear and powerful set of APIs for signaling and handling interrupts, and a well-defined workflow for end-users to resume execution.
Key Features
New: Hierarchical Path-Based Addressing:
Address
system ([]AddressSegment
) that gives every component in a graph a unique, hierarchical, and stable stringified ID (e.g.,runnable:root;node:A;tool:tool_123
).AddressSegment
s. TheAddressSegmentType
is an extensiblestring
, allowing developers to define custom types (likeprocess
oragent
) for their own composite components, in addition to the built-inAddressSegmentNode
,AddressSegmentTool
, andAddressSegmentRunnable
.Runnable
like a sub-graph invoked from within a lambda node automatically inherits the full address of its parent, making its internal interrupt points seamlessly addressable from the outside.New: Modernized, Address-Aware Interrupt API:
Interrupt(ctx, info)
&StatefulInterrupt(ctx, info, state)
: These are the new primary, context-aware functions for creating interrupts. They automatically capture the component's full address from the context, ensuring every interrupt is uniquely addressable.CompositeInterrupt(ctx, info, state, ...errs)
: A powerful new function designed for composite nodes. It accepts a variadic list of sub-errors and correctly bundles them into a single, hierarchical interrupt that the framework can deconstruct.InterruptAndRerun
andNewInterruptAndRerunErr
functions are now deprecated, as they do not carry address information.WrapInterruptAndRerunIfNeeded(ctx, step, err)
: To handle legacy components or simple sub-processes that still use the deprecated errors, this new helper function wraps an error with aAddressSegment
, making it compatible with the newCompositeInterrupt
API.New: User-Facing Resume Workflow:
InterruptInfo
. The user can callinterruptInfo.InterruptContexts
to get a flat list of all available resumable points.InterruptCtx
, which contains the user-facingInfo
and, most importantly, the unique, stableID
of the interrupt point.ID
to target a specific resumption by callingResume(ctx, id)
orResumeWithData(ctx, id, data)
to create a new context for the nextInvoke
call.New: Component-Facing API (
resume.go
):GetInterruptState[T](ctx)
: Allows a component to check if it was previously interrupted and retrieve its persisted state. The return order(wasInterrupted, hasState, state)
follows a natural logical flow.GetResumeContext[T](ctx)
: Allows a component to determine if it is the specific target of aResume
operation and retrieve any associated data. The return order(isResumeFlow, hasData, data)
is similarly intuitive.GetCurrentAddress(ctx)
: Returns the full address of the current component.Guaranteed One-Time State Consumption: The framework ensures that interrupt state and resume data are consumed only once per address per checkpoint. This is a critical correctness guarantee that prevents bugs from accidental state reuse.
How to Implement a Resumable Component
Simple Component
Composite Component
User-Facing Interaction Pattern
功能: 增加健壮的图中断与恢复机制
概要与动机
本次 PR 引入了一套健壮且对开发者友好的图执行中断与恢复机制。此前的系统在处理复杂中断场景时存在一些限制,包括:缺少局部状态持久化、对嵌套组件缺少深度寻址能力、无法提供定向的恢复数据,以及组件的执行上下文模糊不清。
这个新机制是一次彻底的重新设计,通过引入一个稳定的、基于路径的寻址系统,一套清晰、强大的用于发信号和处理中断的 API,以及一个为终端用户定义的、清晰的恢复工作流,解决了所有这些问题。
核心功能
新功能: 分层的、基于路径的寻址:
Address
系统 ([]AddressSegment
),它为图中的每个组件提供了一个唯一的、分层的、稳定的字符串化 ID (例如,runnable:root;node:A;tool:tool_123
)。AddressSegment
组成。AddressSegmentType
是一个可扩展的string
,除了内置的AddressSegmentNode
,AddressSegmentTool
, 和AddressSegmentRunnable
之外,还允许开发者为他们自己的复合组件定义自定义类型 (如process
或agent
)。Runnable
会自动继承其父节点的完整地址,使其内部的中断点可以从外部无缝寻址。新功能: 现代化的、路径感知的中断 API:
Interrupt(ctx, info)
&StatefulInterrupt(ctx, info, state)
: 这是用于创建中断的、新的主要上下文感知函数。它们会自动从上下文中捕获组件的完整地址,确保每个中断都是唯一可寻址的。CompositeInterrupt(ctx, info, state, ...errs)
: 一个为复合节点设计的强大的新函数。它接受一个可变参数的子错误列表,并正确地将它们捆绑成一个单一的、分层的中断错误,框架可以将其解构。InterruptAndRerun
和NewInterruptAndRerunErr
函数现已弃用,因为它们不携带地址信息。WrapInterruptAndRerunIfNeeded(ctx, step, err)
: 为了处理仍在使用已弃用错误的遗留组件或简单子流程,这个新的辅助函数用一个PathStep
包装一个无地址的错误,使其与新的CompositeInterrupt
API 兼容。新功能: 面向用户的恢复工作流:
InterruptInfo
。用户可以调用interruptInfo.InterruptContexts
来获取一个包含所有可用恢复点的扁平列表。InterruptCtx
,其中包含面向用户的Info
,以及最重要地,中断点的唯一、稳定的ID
。ID
,通过调用Resume(ctx, id)
或ResumeWithData(ctx, id, data)
来为下一次Invoke
调用创建一个新的上下文,从而实现定向恢复。新功能: 面向组件的 API (
resume.go
):GetInterruptState[T](ctx)
: 允许组件检查它之前是否被中断过,并检索其持久化的状态。其返回值顺序(wasInterrupted, hasState, state)
遵循自然的逻辑流程。GetResumeContext[T](ctx)
: 允许组件判断它是否是Resume
操作的特定目标,并检索任何关联的数据。其返回值顺序(isResumeFlow, hasData, data)
同样直观。GetCurrentAddress(ctx)
: 返回当前执行组件的完整地址。保证状态一次性消费: 框架确保中断状态和恢复数据在每个检查点中,对于同一地址只被消费一次。这是一个关键的正确性保证,可以防止因意外重用状态而导致的错误。
如何实现一个可恢复的组件
简单组件
复合组件
面向用户的交互模式