-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[Core] Add default Ray Node labels at Node init #53360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc: @MengjinYan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic change looks good to me! Just one comment about testing.
CI tests failures @ryanaoleary |
@edoakes Think I fixed the CI failure with bd51036. |
The solution looks worrisome/hacky. What causes the accelerator manager calls to fail on startup? We might be breaking some kind of assumption there. @jjyao PTAL at the usage of accelerator manager here |
It'd raise an exception here due to missing dependencies such as |
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Mengjin Yan <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Mengjin Yan <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
…sourceAndLabelSpec Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
…nd move record hardware usage to node.py Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
8bb604b
to
4148cc0
Compare
@MengjinYan rebased and re-pushed since there were some changes to |
Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Co-authored-by: Mengjin Yan <[email protected]> Signed-off-by: Krishna Kalyan <[email protected]>
Why are these changes needed?
This PR adds support for populating several default Ray node labels (described here) in the Ray runtime environment when a node is initialized. This change will help support autoscaling with the Label Selector API. This PR is related to ray-project/kuberay#3699 which passes several environment variables from the K8s stack which are used to set
ray.io/
labels. I'll leave a comment on this PR with manual tests showing theray.io/accelerator-type
andray.io/availability-zone
labels getting set.Related issue number
#51564
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.