vllm 不支持batching inference

### System Info / 系統信息

xinference v0.13.2
其中vllm并不支持batching inference。使用openai的batching prompts就会报500错误。

为什么不参考
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_completion.py
https://github.com/vllm-project/vllm/blob/461089a21a5b00d6c6712e3bf371ce2d9cfa0860/vllm/entrypoints/openai/serving_completion.py#L110
的实现，实现batching的功能。

### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

- [X] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装

### Version info / 版本信息

xinference v0.13.2


### The command used to start Xinference / 用以启动 xinference 的命令

opt/conda/bin/python /opt/conda/bin/xinference-worker --metrics-exporter-port 9998 -e http://10.6.208.95:9997 -H 10.6.208.95

### Reproduction / 复现过程
```python
from openai import OpenAI
client = OpenAI()
 
num_stories = 10
prompts = ["Once upon a time,"] * num_stories
 
# batched example, with 10 story completions per request
response = client.completions.create(
    model="qwen2",
    prompt=prompts,
    max_tokens=20,
)
 
# match completions to prompts by index
stories = [""] * len(prompts)
for choice in response.choices:
    stories[choice.index] = prompts[choice.index] + choice.text
 
# print stories
for story in stories:
    print(story)
```

### Expected behavior / 期待表现

as https://platform.openai.com/docs/guides/rate-limits/error-mitigation described

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm 不支持batching inference #1925

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm 不支持batching inference #1925

Description

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions