Add Serving Benchmark Script #29

tiannuo-yang · 2025-06-21T06:08:55Z

This PR introduces a new benchmark script, serving_bench.py, to evaluate the engine's performance under a continuous load of incoming requests, simulating a real-world serving scenario.

Note: This PR is purely additive. No core files have been modified.

Key Features of `serving_bench.py`

Simulates Online Serving: Models a request stream using a Poisson distribution.
Comprehensive Metrics: Measures throughput, Time To First Token (TTFT), Time Per Output Token (TPOT), and end-to-end latency.
Live Progress: Uses tqdm to display real-time progress and average latency.
Configurable: Allows setting the request rate and total number of requests via command-line arguments.

Benchmark Results

The following results demonstrate the system's performance under different request rates (1 L20 48GB GPU, Qwen3-0.6B).

Request Rate (req/s)	Throughput (tok/s)	Avg TTFT (ms)	Avg TPOT (ms/tok)	Avg Latency (s)
4	2046.27	87.56	5.74	3.06
8	3636.29	102.46	10.99	5.85
16	4205.13	142.40	18.07	9.56
32	4631.52	353.51	27.45	14.20

The results show that throughput scales effectively with the request rate, which validates the dynamic batching mechanism. As expected, higher throughput is achieved at the cost of increased latency.

How to Use

# Run the benchmark with a specific request rate
python serving_bench.py --request-rate 16 --num-requests 256

tiannuo-yang · 2025-06-21T06:11:48Z

A screenshot for running serving_bench.py:

vinsblack · 2025-06-29T21:43:38Z

ho trovato questo progetto molto stimolante per chi come me è un appassionato del mondo LLM, il vostro lavoro mi ha ispirato a prendere il vostro modello e a svilupparlo ulteriormente. ho immaginato degli usi anche immediati come ad esempio LLM in supporto per la parte legale, medica ecc. ho creato un piccolo progetto basato sul vostro https://github.com/vinsblack/professional-nano-vllm-enterprise. Grazie per il vostro incredibile lavoro 🙏

add a serving_bench script

e47378c

GeeeekExplorer force-pushed the main branch 3 times, most recently from 71c378f to cde3fc2 Compare June 21, 2025 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Serving Benchmark Script #29

Add Serving Benchmark Script #29

Uh oh!

tiannuo-yang commented Jun 21, 2025 •

edited

Loading

Uh oh!

tiannuo-yang commented Jun 21, 2025

Uh oh!

vinsblack commented Jun 29, 2025

Uh oh!

Uh oh!

Add Serving Benchmark Script #29

Are you sure you want to change the base?

Add Serving Benchmark Script #29

Uh oh!

Conversation

tiannuo-yang commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features of serving_bench.py

Benchmark Results

How to Use

Uh oh!

tiannuo-yang commented Jun 21, 2025

Uh oh!

vinsblack commented Jun 29, 2025

Uh oh!

Uh oh!

tiannuo-yang commented Jun 21, 2025 •

edited

Loading

Key Features of `serving_bench.py`