Skip to content

Conversation

tiannuo-yang
Copy link

@tiannuo-yang tiannuo-yang commented Jun 21, 2025

This PR introduces a new benchmark script, serving_bench.py, to evaluate the engine's performance under a continuous load of incoming requests, simulating a real-world serving scenario.

Note: This PR is purely additive. No core files have been modified.

Key Features of serving_bench.py

  • Simulates Online Serving: Models a request stream using a Poisson distribution.
  • Comprehensive Metrics: Measures throughput, Time To First Token (TTFT), Time Per Output Token (TPOT), and end-to-end latency.
  • Live Progress: Uses tqdm to display real-time progress and average latency.
  • Configurable: Allows setting the request rate and total number of requests via command-line arguments.

Benchmark Results

The following results demonstrate the system's performance under different request rates (1 L20 48GB GPU, Qwen3-0.6B).

Request Rate (req/s) Throughput (tok/s) Avg TTFT (ms) Avg TPOT (ms/tok) Avg Latency (s)
4 2046.27 87.56 5.74 3.06
8 3636.29 102.46 10.99 5.85
16 4205.13 142.40 18.07 9.56
32 4631.52 353.51 27.45 14.20

The results show that throughput scales effectively with the request rate, which validates the dynamic batching mechanism. As expected, higher throughput is achieved at the cost of increased latency.

How to Use

# Run the benchmark with a specific request rate
python serving_bench.py --request-rate 16 --num-requests 256

@tiannuo-yang
Copy link
Author

A screenshot for running serving_bench.py:

WechatIMG481

@GeeeekExplorer GeeeekExplorer force-pushed the main branch 3 times, most recently from 71c378f to cde3fc2 Compare June 21, 2025 09:19
@vinsblack
Copy link

ho trovato questo progetto molto stimolante per chi come me è un appassionato del mondo LLM, il vostro lavoro mi ha ispirato a prendere il vostro modello e a svilupparlo ulteriormente. ho immaginato degli usi anche immediati come ad esempio LLM in supporto per la parte legale, medica ecc. ho creato un piccolo progetto basato sul vostro https://github.com/vinsblack/professional-nano-vllm-enterprise. Grazie per il vostro incredibile lavoro 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants