Skip to content

Conversation

aaronfern
Copy link
Contributor

What this PR does / why we need it:
This PR adds the following

  1. Adapts the retry period for machines in creation so that machines are always reconciled before the creation deadline
  2. Adds a context.WithDeadline to the CreateMachine() call so that this call can time out before the creation deadline expires

Which issue(s) this PR fixes:
Fixes #790

Special notes for your reviewer:

Release note:


@aaronfern aaronfern requested a review from a team as a code owner September 12, 2023 07:54
@gardener-robot gardener-robot added needs/review Needs review size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Sep 12, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 12, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 14, 2023
@himanshu-kun himanshu-kun self-assigned this Sep 14, 2023
@himanshu-kun himanshu-kun added this to the 2023-Q4 milestone Sep 21, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 21, 2023
Copy link
Contributor

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that at this moment when mcm-provider Create() calls don't respect the passed context, we have solved point 2 of the issue

I think the only neat way to solve point 1) is to enable ctx awareness in Create() calls.

But that aside, I have one concern that with CrashLoopBackOff machines getting marked Failed soon could lead to more chances of the race observed in issue gardener/autoscaler#118

Also if we solve the above issue, by considering Failed as Terminating that would basically mean that most of the times MCM is quicker to mark the machine Failed / Terminating and create a new replacement machine(have the same CLBF fate) which means CA will never actually be able to remove this kind of machine (though it'll backoff from node-grp)

@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Oct 11, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 11, 2023
Copy link
Contributor

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit

@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 25, 2023
@himanshu-kun himanshu-kun removed their assignment Oct 30, 2023
@gardener-robot gardener-robot added the needs/rebase Needs git rebase label Jan 5, 2024
@gardener-robot
Copy link

@aaronfern You need rebase this pull request with latest master branch. Please check.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Sep 13, 2024
@ashwani2k
Copy link
Contributor

/hold
Don't merge as it is risky to merge this PR in its current state.
We might consider it during the overhaul instead.

@gardener-robot gardener-robot added the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Oct 16, 2024
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/changes Needs (more) changes needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/rebase Needs git rebase needs/review Needs review reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies size/m Size of pull request is medium (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update CrashLoopBackOff condition as soon as timeout expires
6 participants