-
Notifications
You must be signed in to change notification settings - Fork 127
CSPL-3354: REBASED Add Lifecycle Hooks and Configurable Termination Grace Period to Splunk Operator #1450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
27ee68e to
c21215c
Compare
Pull Request Test Coverage Report for Build 13568057412Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 17272554594Details
💛 - Coveralls |
fca32c0 to
3c32089
Compare
vivekr-splunk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened to old PR , also we need description in the PR,
Signed-off-by: Vivek Reddy <[email protected]>
e71a71f to
5653ba3
Compare
This reverts commit 5653ba3.
|
@rlieberman-splunk or @vivekr-splunk or @Igor-splunk is this something that can get merged and appear in the next release? We have an issue where with automatic node patching in AKS that it appears the Splunk Searchheads don't have a good way to handle in transit jobs and causes searches to cancel/terminate without an explanation. The prestop hook here may help this situation unless you have other thoughts or recommendations. |
Overview
This Pull Request introduces enhancements to the Splunk Operator by integrating Lifecycle Hooks and allowing customers to configure the Termination Grace Period via the Custom Resource (
Common Spec). These changes aim to ensure graceful shutdowns of Splunk pods, thereby maintaining data integrity and improving the reliability of Splunk deployments on Kubernetes.Problem Statement
Customers running Splunk on Kubernetes have reported issues related to abrupt pod terminations, especially during node recycling or maintenance operations. Without proper shutdown procedures, Splunk instances may not decommission gracefully, leading to potential data loss and increased operational churn. Additionally, the lack of configurable grace periods limits customers' ability to tailor shutdown behaviors to their specific environments and requirements.
Proposed Solution
Integrate Lifecycle Hooks:
preStopHook: Executessplunk offlineandsplunk stopcommands before the pod is terminated. This ensures that Splunk instances decommission gracefully, preventing data corruption and loss.Configurable Termination Grace Period:
Common Specof the Splunk Operator’s Custom Resource to allow customers to specifyterminationGracePeriodSeconds.Changes Made
Custom Resource Definition:
terminationGracePeriodSecondsunder thecommonSpecsection to allow customization.StatefulSet Template Update:
lifecyclesection with thepreStophook.terminationGracePeriodSecondsvalue from theCommon Spec.Benefits
Related Issues
Testing Performed
Unit Tests:
terminationGracePeriodSecondsfrom the Custom Resource is correctly applied to the StatefulSet.preStoplifecycle hook executes the appropriate Splunk commands.Integration Tests:
splunk offlineandsplunk stopcommands were executed before termination.terminationGracePeriodSecondsvalues to ensure flexibility and correctness.Manual Testing:
Documentation Updates
Operator README:
terminationGracePeriodSecondsfield in the Custom Resource.Configuration Guides:
terminationGracePeriodSecondsbased on different deployment scenarios.How to Test
Update Custom Resource:
terminationGracePeriodSecondsin your Splunk Operator Custom Resource.Deploy or Update Splunk Cluster:
Verify StatefulSet Configuration:
preStoplifecycle hook and the correctterminationGracePeriodSeconds.Simulate Pod Termination:
preStophook.Future Considerations
splunk decommissionif it provides more comprehensive shutdown procedures compared tosplunk offlineandsplunk stop.terminationGracePeriodSecondswithout requiring full cluster redeployments.Reviewer Notes
terminationGracePeriodSecondsfield continue to operate with the default grace period.Pull Request Checklist: