Tracer is a utilitarian tool engineered as part of an effort to evaluate Kavier (see thesis here) and to collect the LLM Trace Archive.
- Reads a CSV of prompts (
size, …, prompt
). - Sends each prompt to the /v1/completions HTTP endpoint exposed by vLLM.
- In parallel, SSHes into the GPU host and samples
nvidia-smi --loop-ms=<SAMPLING_MS>
(timestamp, core %, mem %). - Correlates every GPU sample with the current prompt-ID, input-/output-token counts,
and writes one line per sample to
data/sample_outputs/trace_<timestamp>.csv
.
- Python ≥ 3.10
requests
,python-dotenv
(installed automatically below)- A running vLLM server you can reach over HTTP and (optionally) SSH.
git clone https://github.com/atlarge-research/Tracer.git
cd Tracer
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt # just 2 tiny deps
pip install -e . # editable install, gives the `tracer` CLI
This is how we set our .env file:
export SURF_URL=http://<gpu-host>:8000/v1/completions
export SURF_HOST=<gpu-host>
export SURF_USER=<ssh_user>
export SURF_KEY_PATH=keys/surf_key # private key for SSH
# optional
export SAMPLING_MS=100 # ms between nvidia-smi samples
export PROMPT_DIR=$(pwd)/data/sample_inputs
export TRACE_DIR=$(pwd)/data/sample_outputs
tracer --csv for_tracing_prefill.csv
or
tracer --csv sample_inputs/for_tracing_decode.csv
CLI will show e.g.,
id=0 size=64 lat=0.753s len=7
id=1 size=128 lat=0.760s len=12
...
CSV traces will appear in $TRACE_DIR/trace_<timestamp>.csv
.
We already provide an example of how input traces should look like and how output trace will look like.
For this, see data/
.
Tracer is distributed under the MIT license. See LICENSE.txt.