-
Notifications
You must be signed in to change notification settings - Fork 765
Open
Labels
questionFurther information is requestedFurther information is requested
Milestone
Description
问题描述
用UI启动的embedding/rerank模型,没有并发相关的设置
客户端用asyncio、concurrent.futures方式发送请求,速度竟然比同步的for loop还慢
应该怎么能使模型并发推理?
xinference侧启动的模型
测试结果
embedding接口测试: model.create_embedding(text)
rerank接口测试:model.rerank(corpus, query)
langchain的XinferenceEmbeddings接口测试
接口:
from langchain_community.embeddings import XinferenceEmbeddings
xinference = XinferenceEmbeddings(
server_url=xinference_url, model_uid="bge-m3"
)
local_kb_xin = FAISS.load_local(
"../data/vector_store/vector-bge-m3", embeddings=xinference, allow_dangerous_deserialization=True)
local_kb_xin.similarity_search(query=query, include_metadata=True, k=30)另外
测试embedding接口,文本长或短,用时都是for loop最快
rerank受长度影响比较大,上面的结果都是query很短,corpus很长的情况(应用中的实际情况),如果是 https://inference.readthedocs.io/zh-cn/latest/user_guide/client_api.html#rerank中的示例,测试结果如下:

Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested



