Skip to content

proper graceful shutdown settings #381

Open
@guzzijones

Description

@guzzijones

We have all the proper graceful shutdown settings for actionrunner and workflow, but we are still seeing action executions get stuck in a running state. At a mininum they should be abandoned per actionrunner code.

I do notice that we are performing a query
coordinator.get_members(service.encode("utf-8")).get()

and then adding to a counter to determine if we are past the expiration.
What may be happening is that if the action_runner.graceful_shutdown config is set to say 100 seconds over 100 seconds need to actually pass before the logic to abandon executions is called

This would mean that we could set the terminationGracePeriodSeconds to say 300 seconds longer (or some long time) than the action_runner.graceful_shutdown seconds to ensure that the pod is alive long enough to let the abandon process finish.

I have made this change so we can monitor our prod cluster.

A code fix would be to use the action_runner.graceful_shutdown config to calculate and end date time and then check against that in the while loop

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions