[CI failed]: V1 Test Failed due to "No available memory for the cache blocks" in GitHub Actions

### Anything you want to discuss about vllm.

I encountered c1 test failure when running CI while initiating the merge request code
code link：https://github.com/vllm-project/vllm/pull/14377
The error is as follows：

[2025-03-10T15:07:20Z] INFO 03-10 08:07:20 [gpu_model_runner.py:1067] Model loading took 0.2389 GB and 0.977005 seconds
--
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302] EngineCore hit an exception: Traceback (most recent call last):
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 294, in run_engine_core
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]     engine_core = EngineCoreProc(*args, **kwargs)
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 249, in __init__
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]     super().__init__(vllm_config, executor_class, log_stats)
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 59, in __init__
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]     num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 102, in _initialize_kv_caches
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]     kv_cache_configs = get_kv_cache_configs(vllm_config, kv_cache_specs,
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/core/kv_cache_utils.py", line 576, in get_kv_cache_configs
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]     check_enough_kv_cache_memory(vllm_config, kv_cache_spec,
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/core/kv_cache_utils.py", line 468, in check_enough_kv_cache_memory
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]     raise ValueError("No available memory for the cache blocks. "
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302] **ValueError: No available memory for the cache blocks. Try increasing `gpu_memory_utilization` when initializing the engine.**
  | [2025-03-10T15:07:20Z] ERROR 03-10 08:07:20 [core.py:302]
  | [2025-03-10T15:07:20Z] CRITICAL 03-10 08:07:20 [core_client.py:259] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
  | [2025-03-10T15:07:20Z] bash: line 1:   247 Killed                  VLLM_USE_V1=1 pytest -v -s v1/engine



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

How should I solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI failed]: V1 Test Failed due to "No available memory for the cache blocks" in GitHub Actions #14574

Anything you want to discuss about vllm.

[2025-03-10T15:07:20Z] INFO 03-10 08:07:20 [gpu_model_runner.py:1067] Model loading took 0.2389 GB and 0.977005 seconds

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[CI failed]: V1 Test Failed due to "No available memory for the cache blocks" in GitHub Actions #14574

Description

Anything you want to discuss about vllm.

[2025-03-10T15:07:20Z] INFO 03-10 08:07:20 [gpu_model_runner.py:1067] Model loading took 0.2389 GB and 0.977005 seconds

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions