diff --git a/docs/user-manuals/cloneset.md b/docs/user-manuals/cloneset.md index 250eecdc61..65e6629023 100644 --- a/docs/user-manuals/cloneset.md +++ b/docs/user-manuals/cloneset.md @@ -529,6 +529,112 @@ spec: paused: true ``` +### Progress Deadline Seconds + +**FEATURE STATE:** Kruise v1.9.0 + +The `.spec.progressDeadlineSeconds` field is an optional field that defines the maximum time (in seconds) the CloneSet controller waits before determining that a rollout has failed to make progress. When this deadline is exceeded without progress, the CloneSet controller records the following condition in the resource status: +```yaml +type: Progressing +status: False +reason: ProgressDeadlineExceeded +``` + +By default, CloneSet does not set this value, so the CloneSet controller will not record the condition to `.status.conditions` while the rollout is ongoing. + +Once this value is set, the CloneSet controller will continuously check the rollout status within the specified time. Higher-level orchestration systems can leverage this status to trigger corresponding actions, e.g.rollback the CloneSet (even when this status is marked as timeout, it does not affect the underlying CloneSet controller's continued rolling updates of Pods). +> **Note:** +> +> If specified, this field value must be greater than `.spec.minReadySeconds`. + +Therefore, by configuring `.spec.progressDeadlineSeconds`, a CloneSet will traverse multiple states during its lifecycle: +- Progressing: the rollout is ongoing. +- Available: the partition update is successful or the rollout is successful. +- Failed: the rollout is timeout. + +#### Progressing State Reason + +The following are cases where the Progressing condition status is True: + +| Reason | Message | Description | +|------------------------------------|-------------------------------------------------|------------------------------------| +| CloneSetUpdated | CloneSet is progressing/CloneSet is resumed | Rollout is in progress | +| CloneSetAvailable | CloneSet is available | Rollout has completed successfully | +| CloneSetProgressPaused | CloneSet is paused | Rollout is paused | +| CloneSetProgressPartitionAvailable | CloneSet has been paused due to partition ready | Partition update is successful | + +#### Progressing CloneSet +A CloneSet is marked as Progressing when performing any of the following operations: + +- Rolling out a new revision. +- Scaling up the newest revision during upgrade. +- Scaling down older revisions during upgrade. +- New Pods are ready or available (satisfying MinReadySeconds condition). + +When the rollout enters the "Progressing" state, the CloneSet controller adds the following condition to the CloneSet's `.status.conditions`: +```yaml +type: Progressing +status: "True" +reason: CloneSetUpdated +``` + +#### Available CloneSet +**Partition Paused:** + +A CloneSet enters the partition paused state when: +- All replicas associated with the CloneSet partition have been updated to the specified latest revision. +- All replicas associated with the CloneSet partition are available. + +The CloneSet controller adds the following condition to the CloneSet's `.status.conditions`: +```yaml +type: Progressing +status: "True" +reason: ProgressPartitionAvailable +``` + +**Available:** + +A CloneSet is marked as available when: + +- All replicas have been updated to the latest specified revision. +- All replicas are available. +- No old revision replicas are running. + +The CloneSet controller adds the following condition to the CloneSet's `.status.conditions`: +```yaml +type: Progressing +status: "True" +reason: CloneSetAvailable +``` + +The Progressing condition maintains a status value of "True" until a new revision is initiated. This condition persists even when replica availability changes (which affects the Available condition instead). + +#### Failed CloneSet +A CloneSet enters the Failed state when it cannot successfully deploy the latest revision. Common causes include: + +- Insufficient quota +- Readiness probe failures +- Image pull errors +- Insufficient permissions +- Limit ranges +- Application runtime misconfiguration + +This condition can be detected by configuring the `.spec.progressDeadlineSeconds` parameter. Once the deadline is exceeded, the CloneSet controller adds the following condition to the CloneSet's `.status.conditions`: +```yaml +type: Progressing +status: "False" +reason: ProgressDeadlineExceeded +``` + +> **Note:** +> +> When a CloneSet rollout is paused, the controller stops progress checking against the specified deadline. Users can safely pause and resume a CloneSet rollout in the middle of the rollout without triggering the deadline exceeded condition. + +#### Operations on Failed CloneSet +All operations applicable to a Complete CloneSet can also be applied to a Failed CloneSet, including: +- Rolling back to a previous revision. +- Pausing the rollout to make multiple adjustments to the Pod template. + ### In-Place Update Support for Modifying Resources **FEATURE STATE:** Kruise v1.8.0 @@ -762,4 +868,3 @@ Currently, both status and metadata changes of Pods will trigger the reconcile o However, for larger clusters or scenarios with frequent Pod update events, these unnecessary reconciles will block the real CloneSet reconciles, resulting in delayed rolling updates and other changes. To solve this problem, you can enable the **feature-gate CloneSetEventHandlerOptimization** to reduce some unnecessary reconcile enqueues. - diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/cloneset.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/cloneset.md index 1356543dd2..1228da0050 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/cloneset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/cloneset.md @@ -506,6 +506,119 @@ spec: paused: true ``` +### 进度期限机制 + +**FEATURE STATE:** Kruise v1.9.0 + +`.spec.progressDeadlineSeconds` 是一个可选配置项,用于定义 CloneSet 控制器在判定升级部署失败前的最大等待时间(秒)。当超过此期限仍未取得进展时,CloneSet 控制器将在资源状态中记录相应的状况条目: +```yaml +type: Progressing +status: False +reason: ProgressDeadlineExceeded +``` + +CloneSet 默认不会设置该值,因此在默认情况下 CloneSet 控制器不会在 `.status.conditions` 上记录相应的状况条目。 + +一旦设置该值,CloneSet 控制器会在设定的时间内持续检查部署操作。上层编排系统可利用此状态来触发对应的动作,例如进行 CloneSet 的回滚操作(即使此状态判定为超时,也不会影响底层 CloneSet 控制器继续对 Pod 进行滚动升级)。 + +> **注意:** +> +> 如果指定,则此字段值需要大于 `.spec.minReadySeconds` 取值。 + +因此,通过配置 `.spec.progressDeadlineSeconds`,会使得 CloneSet 在其生命周期中会经历多种状态: +- Progressing(进行中):部署过程正在进行。 +- Available(可用):分组部署完成或者整体部署成功。 +- Failed(失败):部署超时以至于无法继续进行。 + +#### Progressing 状态原因说明 + +以下为 Progressing 状况条目为 True 的情况: + +| Reason | Message | Description | +|------------------------------------|-------------------------------------------------|-------------| +| CloneSetUpdated | CloneSet is progressing/CloneSet is resumed | 发布升级过程中 | +| CloneSetAvailable | CloneSet is available | 发布升级已完成 | +| CloneSetProgressPaused | CloneSet is paused | 发布升级暂停中 | +| CloneSetProgressPartitionAvailable | CloneSet has been paused due to partition ready | 发布升级达到指定比例 | + + +#### 进行中的 CloneSet +当执行以下任一操作时,CloneSet 将被标记为 Progressing 状态: +- 执行滚动升级操作。 +- 升级过程中为最新版本 Revision 进行扩容。 +- 升级过程中为旧版本 Revision 进行缩容。 +- 新创建的 Pod 已就绪或可用(满足 MinReadySeconds 条件)。 + +此时,CloneSet控制器会在 `.status.conditions` 中添加以下状况条目: + +```yaml +type: Progressing +status: "True" +reason: CloneSetUpdated +``` + +#### 可用的 CloneSet +Complete 状态分为两种子状态: + +**分组暂停状态:** + +当满足以下条件时,CloneSet 进入分组暂停状态: + +- 指定 partition 比例的副本已更新至最新版本。 +- 指定 partition 比例的副本均处于可用状态。 + +CloneSet 控制器会向 CloneSet 的 `.status.conditions` 中添加包含下面属性的状况条目: + +```yaml +type: Progressing +status: "True" +reason: ProgressPartitionAvailable +``` + +**可用状态:** +当以下条件发生时,Kruise 会将 CloneSet 变为可用状态: + +- 所有副本均已更新至最新版本。 +- 所有副本均处于可用状态。 +- 无旧版本副本运行。 + +CloneSet 控制器会向 CloneSet 的 `.status.conditions` 中添加包含下面属性的状况条目: + +```yaml +type: Progressing +status: "True" +reason: CloneSetAvailable +``` + +Progressing 的状况将会持续保持 "True",直到触发新的升级部署操作。即使副本可用性发生变化,此状况值也不会改变。 + +#### 失败的 CloneSet +当 CloneSet 无法成功部署最新 Revision 时,将进入 Failed 状态。常见原因包括: + +- 资源配额不足 +- 就绪探针失败 +- 镜像拉取失败 +- 权限不足 +- LimitRanges 配置问题 +- 应用运行时配置错误 + +通过配置 `.spec.progressDeadlineSeconds` 参数可检测此状况。超过截止时间后,CloneSet 控制器将向 `.status.conditions` 添加以下状况条目: + +```yaml +type: Progressing +status: "False" +reason: ProgressDeadlineExceeded +``` + +> **说明:** +> +> 当用户暂停 CloneSet 部署时,控制器将停止进度检查。用户可在部署过程中安全地暂停和恢复操作,不会触发超时判定。 + +#### 对失败 CloneSet 的操作 +对于处于 Failed 状态的 CloneSet,可执行与 Complete 状态相同的管理操作,包括: +- 回滚到历史修订版本。 +- 暂停部署过程以进行 Pod 模板的多项调整。 + ### 原地升级支持修改资源 **FEATURE STATE:** Kruise v1.8.0