-
Notifications
You must be signed in to change notification settings - Fork 403
Closed
Labels
kind/featurenew functionnew function
Description
This is the issue for HAMi RoapMap, you can simply reply this PR to submit your ideas
Next Version: v2.7.0
Estimated release: Aug-Sep/2025
Tasks
- Incubating preperations (Incubating Preperations #1057)
- Support kunlunxin sGPU & Scheduling /assign @ouyangluwei163
- Support kunlunxin Lite(scheduling only) /assign @archlitchi
- Support enflame GPU topology /assign @zhaikangqi331
- Support enflame sGPU new plan
- Optimize scheduler events can be access by :31994 @Wangmin362
- Website Optimization /assign @ouyangluwei163 @Nimbus318
- DRA design /assign @Shouren
- HAMi for KAI scheduler
- Support multiple cambricon types (370,590,etc..)
- Optimize project governance on sub-projects(website, volcano-vgpu-device-plugin, etc..) /assign @archlitchi
Bugs:
- An error occurred while create Iluvatar pod #933 @ouyangluwei163
- cuda 13.0 device memory count error #1328 @archlitchi
Version v2.6.1/ v2.5.3
Tasks
Fix Bugs:
- [BUG] nvidia-smi shows abnormal GPU memory usage (17TB+) in HAMi containers causing false OOM errors #1181
- libvgpu.so segfault for cuDeviceGet #1055
- seg fault with vllm + PP #1219
- Cannot support vllm= 0.9.0 nvidia-nccl-cu12==2.26.2, tp>1 #1230
- HAMI 2.6.0 can't run vllm 0.9.0-cuda12.8.1 when tp=8 #1191
- Device 0 OOM when multiple call cuMem*Async and cuMemFreeAsync HAMi-core#96
- The app in the container uses more GPU memory than the allocated GPU memory, but the container does not crash #1221
Below are the roadmap history
Next Version: v2.6.0
Estimated release: May-June/2025
Tasks:
- Support kunlunxin sGPU /assign @ouyangluwei163
- Support metax sGPU
- Support enflame sGPU
- Optimize scheduler events
- Website Optimization /assign @ouyangluwei163 @Nimbus318
- Add metax to website
- Support driver 570+cuda 12.8 /assign @archlitchi
- DRA design /assign @Shouren
- Support multiple cambricon types (370,590,etc..)
Bugs:
- Scheduled by exclusive GPU, no pod memory usage indicator is included in the 31992 monitoring indicators #931
- An error occurred while create Iluvatar pod #933 @ouyangluwei163
- When using a Cambricon MLU370 with multiple cards, Hami can only schedule them to one physical card. #946 @archlitchi
- Upgrade business pod room after Hami 2.5 #940 @archlitchi
Metadata
Metadata
Assignees
Labels
kind/featurenew functionnew function