Configure retries/backoff for gitlab-runner k8s API requests #1176

mvandenburgh · 2025-08-01T15:52:57Z

Docs for these two settings: https://docs.gitlab.com/runner/executors/kubernetes/#configure-the-number-of-request-attempts-to-the-kubernetes-api

Job system failures like this one, i.e. an error that looks like error dialing backend: remote error: tls: internal error, indicate that the pipeline pod failed to receive a response from the k8s/EKS API server. It's still unclear why this is happening, but one potential explanation is that the default timeout for EKS API requests (2 seconds) is getting exceeded.

Long term, I would like to set up https://docs.aws.amazon.com/eks/latest/best-practices/control_plane_monitoring.html so we can get more insight into what's going on with the control plane.

jjnesbitt · 2025-08-01T17:54:07Z

k8s/production/runners/protected/graviton/3/release.yaml

+            retry_backoff_max = 30000
+
+            # This is the default retry limit. We override this for specific classes of
+            # errors below.
+            retry_limit = 5
+
+            [runners.kubernetes.retry_limits]
+              # Retry this type of error 10 times instead of 5.
+              # This error usually occurs when the EKS API server times out or
+              # is unreachable. Presumably the server will eventually become
+              # available again, so we want to give the pod plenty of time to retry.
+              "tls: internal error" = 10


retry_backoff_max seems to just control the maximum value the retry interval can reach. Do you know what value the retry interval starts at? And how the backoff is incremented? Is it doubled each time, etc.?

Configure retries/backoff for gitlab-runner k8s API requests

4506816

mvandenburgh requested review from jjnesbitt and zackgalbreath August 1, 2025 16:00

jjnesbitt reviewed Aug 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configure retries/backoff for gitlab-runner k8s API requests #1176

Configure retries/backoff for gitlab-runner k8s API requests #1176

Uh oh!

mvandenburgh commented Aug 1, 2025

Uh oh!

jjnesbitt Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Configure retries/backoff for gitlab-runner k8s API requests #1176

Are you sure you want to change the base?

Configure retries/backoff for gitlab-runner k8s API requests #1176

Uh oh!

Conversation

mvandenburgh commented Aug 1, 2025

Uh oh!

jjnesbitt Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants