add regression testing docs #755

kaushikmitr · 2025-04-28T22:57:26Z

This pull request introduces multiple changes aimed at setting up regression testing for inference models and updating related documentation. Key changes include the addition of Kubernetes manifests for deploying and testing inference models, as well as updates to the documentation to reflect these new capabilities.

Regression Testing Setup:

Added InferenceModel configurations for 15 adapters and one base model (meta-llama/Llama-3.1-8B-Instruct) in config/manifests/regression-testing/inferencemodel.yaml. These models are critical and reference the vllm-llama3-8b-instruct pool.
Added a deployment manifest for a benchmarking tool in config/manifests/regression-testing/multi-lora-regression.yaml. This tool benchmarks multiple LoRA adapters with specified traffic splits and request rates.
Added a deployment manifest for single workload regression testing in config/manifests/regression-testing/single-workload-regression.yaml. This focuses on benchmarking the base model with higher request rates.
Added a deployment manifest for the vllm-llama3-8b-instruct model server in config/manifests/regression-testing/vllm/multi-lora-deployment.yaml. This includes configurations for LoRA adapters, readiness and liveness probes, and resource limits for GPUs.

Documentation Updates:

Added a new "Regression Testing" section to the navigation in mkdocs.yml.
Updated the benchmarking guide in site-src/performance/benchmark/index.md to include instructions for updating benchmark IDs with inference-extension and k8s-svc in the last notebook cell.

k8s-ci-robot · 2025-04-28T22:57:32Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kaushikmitr
Once this PR has been reviewed and has the lgtm label, please assign jeffwan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-04-28T22:57:36Z

Hi @kaushikmitr. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-04-28T22:57:45Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`9a5db2c`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6812bbb4e092830008e60473
😎 Deploy Preview	https://deploy-preview-755--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

kfswain · 2025-04-29T01:05:40Z

/ok-to-test

kfswain · 2025-04-28T23:58:16Z

mkdocs.yml

@@ -20,6 +20,9 @@ theme:
    primary: custom
  custom_dir: site-src/overrides
 edit_uri: edit/main/site-src/
+extra:


OOC, do we need this for this PR?

not needed, removed it

kfswain · 2025-04-29T03:30:06Z

site-src/performance/regression-testing/index.md

+
+### Example 1: Single Workload
+
+- **Dataset:** [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json).


if this is LPG config, can we make these into seperate manifests in the benchmark folder?

kfswain · 2025-04-29T03:32:56Z

site-src/performance/regression-testing/index.md

+
+Perform the benchmarks in two phases:
+
+- **Before applying changes:**


I assume this means that we need to run the benchmark twice, once for before any algo changes, and once after. Can we be more explicit about the wording there?

Also, out of scope of this PR, but we may want to have some baseline data so the user doesn't need to run the benchmark twice, hopefully saving them some time

I agree it would be great to surface some baseline data so folks don’t have to run both phases every time, but need to take variability between users’ setups while evaluating regression. That’s why doing both runs in the same environment feels safest for now—let’s revisit adding shared baselines down the line.

++ absolutely a future problem

liu-cong · 2025-04-29T15:50:04Z

tools/regression-testing/regression-testing.ipynb

Is this different from https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/benchmark/benchmark.ipynb ? Can we consolidate them

liu-cong · 2025-04-29T15:53:00Z

site-src/performance/regression-testing/index.md

+
+This guide demonstrates how to run regression testing against the Gateway API inference extension. Benchmarks are conducted using the [Latency Profile Generator](https://github.com/AI-Hypercomputer/inference-benchmark) (LPG) to simulate traffic and collect detailed metrics.
+
+## Prerequisites


There is a lot of duplicate from https://gateway-api-inference-extension.sigs.k8s.io/performance/benchmark/.

Can we restructure this doc to focus more on defining the regression test cases, and then refer to the benchmark doc on how to perform the benchmarks

agreed. PTAL at the updated doc.

tools/benchmark/import-datasets/import_datasets.py

tools/benchmark/import-datasets/requirements.txt

site-src/performance/regression-testing/index.md

liu-cong · 2025-04-29T21:26:19Z

site-src/performance/regression-testing/index.md

+- **Dataset:**  `Infinity-Instruct_conversations.json` generated from the provided Python script `./tools/benchmark/import-datasets`.
+    *  * `Infinity-Instruct_conversations.json` is the [huggingface dataset](https://huggingface.co/datasets/BAAI/Infinity-Instruct) converted to prompt → response style conversation json that can be consumed by the benchmarking script.
+- **Model:** [Llama 3 (8B)](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
+- **LoRA:** 15 adapters using `nvidia/llama-3.1-nemoguard-8b-topic-control` (rank 8, <1% of base model size) (all adapters are *critical*)


It's unclear how to configure 15 adapters with the single suggested adapter. Can you be explicit (I think you can do this by giving alias in vllm startup config, pls add an example here)

see config/manifests/regression-testing/vllm/multi-lora-deployment.yaml

site-src/performance/regression-testing/index.md

liu-cong · 2025-04-29T21:29:33Z

site-src/performance/regression-testing/index.md

+
+Use the following configurations to update `./config/manifests/benchmark/benchmark.yaml` for regression testing. These configurations are tailored to evaluate performance shifts before and after code changes. They assume NVIDIA H100 GPUs (80 GB)—adjust them as needed for different hardware, backend counts, or datasets.
+
+### Example 1: Single Workload


Consider adding example LPG yamls for these two cases, similar to https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/benchmark/benchmark.yaml

added vllm and inferencemodel deployment for multi lora test case and also added benchmark yamls for the two test cases

liu-cong · 2025-04-30T03:30:55Z

config/manifests/regression-testing/multi-lora-regression.yaml

+        - name: FILE_PREFIX
+          value: benchmark
+        - name: PROMPT_DATASET_FILE
+          value: Infinity-Instruct_conversations.json


This is the file that's generated by the import_datasets.py? We will need to build this file to the lpg image in order to use it, right? Can we provide an lpg image with the datasets built-in?

I thought about it but both these datasets require users to sign in to hf to accept an agreement. So I did not put it in the public image. We can have it internally though

OK, got it, then please replace the image with a placeholder <lpg_image>, and explain how to build a new image with this dataset.

added, it seems easier is to make update to the LPG script and point to that here. Pushed my changes to import datasets and create a docker image in the LPG repo. Please try it out.

update traffic split setup update traffic split setup update requirement update regressing testig doc consolidate performance docs add newline add example yamls for multi lora deployment and regression lgp testing fix qps range fix typo fix typo fix typo fix typo fix typo fix typo fix typo fix broken link add instructions to build lpg image update benchmark.yaml update lpg yamls update readme update regfression testing markdown to refine docker image creating for LPG update regression yamls refine regression doc

ahg-g · 2025-05-02T14:25:12Z

config/manifests/regression-testing/single-workload-regression.yaml

+        app: benchmark-tool
+    spec:
+      containers:
+      # Build image from this source https://github.com/AI-Hypercomputer/inference-benchmark/blob/1c92df607751a7ddb04e2152ed7f6aaf85bd9ca7


why not from main?

I have found two issues:

Not sure why but this commit doesn't exist in any branch. This should be fixed.

The Dockerfile needs to be updated to copy the new dataset file to the image.

why not from main?

I don't recommend main because we should pin to a version we currently support and main can change.

k8s-ci-robot · 2025-05-08T01:31:33Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 28, 2025

k8s-ci-robot requested review from danehans and liu-cong April 28, 2025 22:57

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 28, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 28, 2025

kaushikmitr mentioned this pull request Apr 28, 2025

Benchmark Test Harness #732

Open

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 29, 2025

kfswain reviewed Apr 29, 2025

View reviewed changes

liu-cong reviewed Apr 29, 2025

View reviewed changes

kfswain reviewed Apr 29, 2025

View reviewed changes

tools/benchmark/import-datasets/import_datasets.py Outdated Show resolved Hide resolved

tools/benchmark/import-datasets/requirements.txt Outdated Show resolved Hide resolved

liu-cong reviewed Apr 29, 2025

View reviewed changes

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 29, 2025

liu-cong reviewed Apr 30, 2025

View reviewed changes

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 30, 2025

kaushikmitr force-pushed the main branch from 63fda7c to 9a5db2c Compare May 1, 2025 00:09

ahg-g reviewed May 2, 2025

View reviewed changes

liu-cong mentioned this pull request May 7, 2025

Docs: Updates Benchmark Guide #789

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2025

liu-cong mentioned this pull request May 8, 2025

Add prefix cache aware scheduling #768

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add regression testing docs #755

add regression testing docs #755

kaushikmitr commented Apr 28, 2025 •

edited

Loading

k8s-ci-robot commented Apr 28, 2025

k8s-ci-robot commented Apr 28, 2025

netlify bot commented Apr 28, 2025 •

edited

Loading

kfswain commented Apr 29, 2025

kfswain Apr 28, 2025

kaushikmitr Apr 29, 2025

kfswain Apr 29, 2025

kfswain Apr 29, 2025 •

edited

Loading

kaushikmitr Apr 29, 2025

kfswain Apr 29, 2025

liu-cong Apr 29, 2025

kaushikmitr Apr 29, 2025

liu-cong Apr 29, 2025

kaushikmitr Apr 29, 2025

liu-cong Apr 29, 2025

kaushikmitr Apr 29, 2025

liu-cong Apr 29, 2025

kaushikmitr Apr 29, 2025

liu-cong Apr 30, 2025

kaushikmitr Apr 30, 2025 •

edited

Loading

liu-cong Apr 30, 2025

kaushikmitr Apr 30, 2025

ahg-g May 2, 2025

liu-cong May 7, 2025

k8s-ci-robot commented May 8, 2025


		### Example 1: Single Workload

		- Dataset: [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json).


		Perform the benchmarks in two phases:

		- Before applying changes:


		This guide demonstrates how to run regression testing against the Gateway API inference extension. Benchmarks are conducted using the [Latency Profile Generator](https://github.com/AI-Hypercomputer/inference-benchmark) (LPG) to simulate traffic and collect detailed metrics.

		## Prerequisites


		Use the following configurations to update `./config/manifests/benchmark/benchmark.yaml` for regression testing. These configurations are tailored to evaluate performance shifts before and after code changes. They assume NVIDIA H100 GPUs (80 GB)—adjust them as needed for different hardware, backend counts, or datasets.

		### Example 1: Single Workload

add regression testing docs #755

Are you sure you want to change the base?

add regression testing docs #755

Conversation

kaushikmitr commented Apr 28, 2025 • edited Loading

Regression Testing Setup:

Documentation Updates:

k8s-ci-robot commented Apr 28, 2025

k8s-ci-robot commented Apr 28, 2025

netlify bot commented Apr 28, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

kfswain commented Apr 29, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfswain Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaushikmitr Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented May 8, 2025

kaushikmitr commented Apr 28, 2025 •

edited

Loading

netlify bot commented Apr 28, 2025 •

edited

Loading

kfswain Apr 29, 2025 •

edited

Loading

kaushikmitr Apr 30, 2025 •

edited

Loading