- Hong Kong
-
23:48
(UTC +08:00)
Highlights
- Pro
Pinned Loading
-
fastllm
fastllm PublicForked from ztxz16/fastllm
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
C++
-
kvcache-ai/ktransformers
kvcache-ai/ktransformers PublicA Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
-
chitu
chitu PublicForked from thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Python
-
sglang
sglang PublicForked from sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Python
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
vllm-ascend
vllm-ascend PublicForked from vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Python
If the problem persists, check the GitHub status page or contact support.