-
Notifications
You must be signed in to change notification settings - Fork 279
export swap behavior via label #2192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for kubernetes-sigs-nfd ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fmuyassarov The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2a0f109
to
12aecfe
Compare
/test pull-node-feature-discovery-build-image-cross-generic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmuyassarov thank you for the patch and working on this. Let's get this finalized.
My main comment is that in topology-updater we already read the kubelet config (read from the kubelet configz endpoint by default), helpers are in pkg/utils/kubeconf/
. Could we do the same on nfd-worker?
I will check that. |
12aecfe
to
2d60fa9
Compare
2d60fa9
to
e993fd8
Compare
e993fd8
to
91a79d5
Compare
Hi @marquiz . I've reworked the patch to utilize the configz. The current state is missing the documentation update which I thought of adding once I get a green light that this the right approach. Please take a look. |
30fc56f
to
a75f334
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @fmuyassarov. A few quick comments below.
The bigger question to me is how should we hierarchically place the new feature. The feature is related to swap, but specifically kubernetes/kubelet config setting. I could think e.g. a new system.kubelet
or system.kubernetes
feature. Then kubeletconfig config opts would not be command line flags for nfd-worker but config options for the source. And we would get robustness, nfd-worker would not refuse to start if it was not able to get the kubelet config
All in all, I'm not questioning the need for the feature. Just need to think what would be the most logical way to expose and implement it. Need to sleep over this.
pkg/nfd-worker/nfd-worker.go
Outdated
// Add pod owner reference if it exists | ||
if podName != "" { | ||
if selfPod, err := w.k8sClient.CoreV1().Pods(w.kubernetesNamespace).Get(context.TODO(), podName, metav1.GetOptions{}); err != nil { | ||
if selfPod, err := w.k8sClient.CoreV1().Pods(podNamespace).Get(context.TODO(), podName, metav1.GetOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with w.kubernetesNamespace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly from my test, if kubernetesNamespace
is empty it will refuse because we are not using the clusterRole. But if I follow your comment earlier, keeping the Role as is and creating the ClusterRole only for the nodes/proxy, then we can keep w.kubernetesNamespace
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can expect this to be reverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can expect this to be reverted?
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marquiz It looks like we still need this. Following your earlier comment about using a ClusterRole only for nodes/proxy
, I updated the code to create a ClusterRole and ClusterRoleBinding in addition to the existing Role and RoleBinding. However, during my test runs, since I’m not providing the KUBERNETES_NAMESPACE
variable, it defaults to an empty value for w.kubernetesNamespace
, which causes the following issue:
E0918 10:19:43.598034 1 nfd-worker.go:282] "failed to get self pod, cannot inherit ownerReference for NodeFeature" err="the server could not find the requested resource"
Tracking new pod rollout (nfd-worker-5m584):
┊ Scheduled - <1s
┊ Initialized - <1s
┃ Not Ready - (ContainersNotReady): containers with unready status: [nfd-worker]
[event: pod node-feature-discovery/nfd-worker-5m584] Container image "ttl.sh/gcr.io_k8s-staging-nfd_node-feature-discovery:tilt-bc71b0b17aea3fb5" already present on machine
I0918 10:19:41.895350 1 nfd-worker.go:315] "Node Feature Discovery Worker" version="" nodeName="eseldb12u01" namespace=""
E0918 10:19:41.904694 1 nfd-worker.go:282] "failed to get self pod, cannot inherit ownerReference for NodeFeature" err="the server could not find the requested resource"
E0918 10:19:41.904738 1 main.go:73] "error while running" err="the server could not find the requested resource"
[event: pod node-feature-discovery/nfd-worker-5m584] Container image "ttl.sh/gcr.io_k8s-staging-nfd_node-feature-discovery:tilt-bc71b0b17aea3fb5" already present on machine
[event: pod node-feature-discovery/nfd-worker-5m584] Back-off restarting failed container nfd-worker in pod nfd-worker-5m584_node-feature-discovery(784862c3-3fbe-445a-8cc2-e7bee0675246)
Detected container restart. Pod: nfd-worker-5m584. Container: nfd-worker.
I0918 10:19:43.506399 1 nfd-worker.go:315] "Node Feature Discovery Worker" version="" nodeName="eseldb12u01" namespace=""
E0918 10:19:43.598034 1 nfd-worker.go:282] "failed to get self pod, cannot inherit ownerReference for NodeFeature" err="the server could not find the requested resource"
E0918 10:19:43.598101 1 main.go:73] "error while running" err="the server could not find the requested resource"
[event: pod node-feature-discovery/nfd-worker-5m584] Back-off restarting failed container nfd-worker in pod nfd-worker-5m584_node-feature-discovery(784862c3-3fbe-445a-8cc2-e7bee0675246)
[event: pod node-feature-discovery/nfd-worker-5m584] Back-off restarting failed container nfd-worker in pod nfd-worker-5m584_node-feature-discovery(784862c3-3fbe-445a-8cc2-e7bee0675246)
[event: pod node-feature-discovery/nfd-worker-5m584] Back-off restarting failed container nfd-worker in pod nfd-worker-5m584_node-feature-discovery(784862c3-3fbe-445a-8cc2-e7bee0675246)
[event: pod node-feature-discovery/nfd-worker-5m584] Container image "ttl.sh/gcr.io_k8s-staging-nfd_node-feature-discovery:tilt-bc71b0b17aea3fb5" already present on machine
[event: pod node-feature-discovery/nfd-worker-5m584] Back-off restarting failed container nfd-worker in pod nfd-worker-5m584_node-feature-discovery(784862c3-3fbe-445a-8cc2-e7bee0675246)
Detected container restart. Pod: nfd-worker-5m584. Container: nfd-worker.
But if we rely on the downward API to fetch its own namespace then we can avoid leaving w.kubernetesNamespace
empty. Perhaps there are some issues of this approach that I'm unaware of, but I'm willing to discuss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmuyassarov w.kubernetesNamespace
is initialized with utils.GetKubernetesNamespace()
. That, in turn, takes the value from /var/run/secrets/kubernetes.io/serviceaccount/namespace
or if not found, fallbacks to KUBERNETES_NAMESPACE
env variable. That should work. In normal deployments there should be no need to set the env variable but in tilt you might need to.
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR exports a label indicating the kubelet's configured swap behavior for memory swap feature discovery. The change allows NFD to report whether Kubernetes workloads are allowed to use swap memory based on kubelet configuration.
- Adds functionality to read kubelet configuration and extract swap behavior settings
- Implements support for both file-based and API-based kubelet config access
- Updates RBAC permissions to support accessing kubelet configuration endpoints
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
source/memory/memory.go | Adds swap behavior detection and labeling functionality |
pkg/nfd-worker/nfd-worker.go | Implements kubelet config reading and swap behavior initialization |
deployment/components/common/env.yaml | Adds required environment variables for node address and pod namespace |
deployment/base/rbac/worker-rolebinding.yaml | Changes from Role to ClusterRole binding for broader permissions |
deployment/base/rbac/worker-role.yaml | Updates permissions to include nodes/proxy access for kubelet API |
cmd/nfd-worker/main.go | Adds command-line flags and default kubelet config URI construction |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
nfd = &nfdWorker{ | ||
kubeletConfigFunc: kubeletConfigFunc, | ||
} | ||
|
||
for _, o := range opts { | ||
o.apply(nfd) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nfd variable is being reassigned to a new nfdWorker struct, which discards all the previous initialization including the stop channel and other fields set earlier in the function. This will cause the worker to lose its previous configuration.
nfd = &nfdWorker{ | |
kubeletConfigFunc: kubeletConfigFunc, | |
} | |
for _, o := range opts { | |
o.apply(nfd) | |
} | |
nfd.kubeletConfigFunc = kubeletConfigFunc |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 comments
pkg/nfd-worker/nfd-worker.go
Outdated
// Add pod owner reference if it exists | ||
if podName != "" { | ||
if selfPod, err := w.k8sClient.CoreV1().Pods(w.kubernetesNamespace).Get(context.TODO(), podName, metav1.GetOptions{}); err != nil { | ||
if selfPod, err := w.k8sClient.CoreV1().Pods(podNamespace).Get(context.TODO(), podName, metav1.GetOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can expect this to be reverted?
@fmuyassarov independent of where we logically put the new feature, I think the detection should be outside nfd-worker core package. If we keep it in memory.swap, then probably the feature name should somehow reflect that it's kubelet behavior. |
The reason for splitting the detection between memory and worker pkgs is that, memory pkg doesn't have access to the kubeconfig & kubelet. Technically speaking, there is no need to do anything within the memory pkg. But, the reason I used memory pkg is because we are detecting the memory related feature. Sure, it is via kubelet, but the main thing is not kubelet, but the memory. Since we already had a logic of detecting the swap memory within the memory pkg, I thought of extending it for swap memory type too. |
Signed-off-by: Feruzjon Muyassarov <[email protected]>
a75f334
to
f5adaa8
Compare
@fmuyassarov: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This patch exports a label indicating the kubelet's configured swap behavior. By default, it reads from
/var/lib/kubelet/config.yaml
. If a custom kubelet config path is used, it can be set via the-kubelet-config-path
flag.If the swap behavior is not specified,
NoSwap
is assumed (matching kubelet's default).Note:
feature.node.kubernetes.io/memory-swap.behavior
(or in general kubelet'smemorySwap.swapBehavior
) reflects whether k8s workloads are allowed to use swap. A node may have swap enabled, but if kubelet is set toNoSwap
, pods cannot use it. As such, the behavior label will be exported only if node level swap is enabled.Fixes: #2178