Configure retries/backoff for gitlab-runner k8s API requests #1176
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Docs for these two settings: https://docs.gitlab.com/runner/executors/kubernetes/#configure-the-number-of-request-attempts-to-the-kubernetes-api
Job system failures like this one, i.e. an error that looks like
error dialing backend: remote error: tls: internal error
, indicate that the pipeline pod failed to receive a response from the k8s/EKS API server. It's still unclear why this is happening, but one potential explanation is that the default timeout for EKS API requests (2 seconds) is getting exceeded.Long term, I would like to set up https://docs.aws.amazon.com/eks/latest/best-practices/control_plane_monitoring.html so we can get more insight into what's going on with the control plane.