Skip to content

Conversation

DannyLiCom
Copy link
Collaborator

@DannyLiCom DannyLiCom commented Aug 12, 2025

Fixes / Features

  • Add credential test with DNS retry logic
  • Added a run_command_and_capture_output function to capture errors

Testing / Documentation

To verify this feature, run the standard xpk cluster create-pathways command. The new credential test with DNS retry logic will be executed as part of the cluster creation process.
https://screenshot.googleplex.com/cTjV7ABQr2omQZd

If a DNS endpoint-related error occurs, the following message will be displayed, followed by an automatic retry:
https://screenshot.googleplex.com/BunMupVA8Yp998o
However, because the --enable-dns-access flag is present by default, the error mentioned above will not appear.

  • [ y/n ] Tests pass
  • [ y/n ] Appropriate changes to documentation are included in the PR

@@ -354,3 +354,29 @@ def run_kubectl_apply(yml_string: str, task: str, args: Namespace) -> int:
command = f'kubectl apply -f {str(tmp.file.name)}'
err_code = run_command_with_updates(command, task, args)
return err_code

def run_command_and_capture_output(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I added this function is that I need the output string along with a 0 or 1 return code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return_code is usually the second parameter across entire codebase. tuple[<result>, <error_code>]. Could you swap the order of values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -64,6 +65,7 @@ def set_jobset_on_cluster(args) -> int:
command = (
'kubectl apply --server-side -f'
f' https://github.com/kubernetes-sigs/jobset/releases/download/{JOBSET_VERSION}/manifests.yaml'
' --force-conflicts'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might require @pawloch00 to confirm whether this flag is needed. The details are described in this ticket.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scaliby do you happen to have an opinion about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SujeethJinesh I read the ticket. IMO this is fine.

@DannyLiCom
Copy link
Collaborator Author

@SujeethJinesh Please review this PR. My current default for get-credentials is --dns-endpoint, because I see that the create cluster command also defaults to --enable-dns-access.

)
xpk_print(kubectl_output)

if kubectl_return_code != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if dns_endpoint_error in kubectl_output: can be inversed to reduce nesting.

if dns_endpoint_error not in kubectl_output:
  xpk_print(f'kubectl failed with an unhandled error: {kubectl_output}')
  xpk_exit(kubectl_return_code)
# The rest of the logic

Also if kubectl_return_code != 0: could be inversed further reducing nesting:

if kubectl_return_code == 0:
  xpk_print("Credentials test suceeded")
  return 0

# The rest of the logic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -354,3 +354,29 @@ def run_kubectl_apply(yml_string: str, task: str, args: Namespace) -> int:
command = f'kubectl apply -f {str(tmp.file.name)}'
err_code = run_command_with_updates(command, task, args)
return err_code

def run_command_and_capture_output(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return_code is usually the second parameter across entire codebase. tuple[<result>, <error_code>]. Could you swap the order of values?

@DannyLiCom
Copy link
Collaborator Author

@scaliby I modified the if statements to reduce nesting. Please review it again.

Copy link
Collaborator

@SujeethJinesh SujeethJinesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just needed confirmation about --force-conflicts

@@ -64,6 +65,7 @@ def set_jobset_on_cluster(args) -> int:
command = (
'kubectl apply --server-side -f'
f' https://github.com/kubernetes-sigs/jobset/releases/download/{JOBSET_VERSION}/manifests.yaml'
' --force-conflicts'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SujeethJinesh I read the ticket. IMO this is fine.

@scaliby
Copy link
Collaborator

scaliby commented Sep 3, 2025

@DannyLiCom any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants