Skip to content

add regression testing docs #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kaushikmitr
Copy link
Contributor

@kaushikmitr kaushikmitr commented Apr 28, 2025

This pull request introduces multiple changes aimed at setting up regression testing for inference models and updating related documentation. Key changes include the addition of Kubernetes manifests for deploying and testing inference models, as well as updates to the documentation to reflect these new capabilities.

Regression Testing Setup:

  • Added InferenceModel configurations for 15 adapters and one base model (meta-llama/Llama-3.1-8B-Instruct) in config/manifests/regression-testing/inferencemodel.yaml. These models are critical and reference the vllm-llama3-8b-instruct pool.
  • Added a deployment manifest for a benchmarking tool in config/manifests/regression-testing/multi-lora-regression.yaml. This tool benchmarks multiple LoRA adapters with specified traffic splits and request rates.
  • Added a deployment manifest for single workload regression testing in config/manifests/regression-testing/single-workload-regression.yaml. This focuses on benchmarking the base model with higher request rates.
  • Added a deployment manifest for the vllm-llama3-8b-instruct model server in config/manifests/regression-testing/vllm/multi-lora-deployment.yaml. This includes configurations for LoRA adapters, readiness and liveness probes, and resource limits for GPUs.

Documentation Updates:

  • Added a new "Regression Testing" section to the navigation in mkdocs.yml.
  • Updated the benchmarking guide in site-src/performance/benchmark/index.md to include instructions for updating benchmark IDs with inference-extension and k8s-svc in the last notebook cell.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 28, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kaushikmitr
Once this PR has been reviewed and has the lgtm label, please assign jeffwan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 28, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @kaushikmitr. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 28, 2025
Copy link

netlify bot commented Apr 28, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 9a5db2c
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6812bbb4e092830008e60473
😎 Deploy Preview https://deploy-preview-755--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kfswain
Copy link
Collaborator

kfswain commented Apr 29, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 29, 2025
mkdocs.yml Outdated
@@ -20,6 +20,9 @@ theme:
primary: custom
custom_dir: site-src/overrides
edit_uri: edit/main/site-src/
extra:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC, do we need this for this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, removed it


### Example 1: Single Workload

- **Dataset:** [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is LPG config, can we make these into seperate manifests in the benchmark folder?


Perform the benchmarks in two phases:

- **Before applying changes:**
Copy link
Collaborator

@kfswain kfswain Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means that we need to run the benchmark twice, once for before any algo changes, and once after. Can we be more explicit about the wording there?

Also, out of scope of this PR, but we may want to have some baseline data so the user doesn't need to run the benchmark twice, hopefully saving them some time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be great to surface some baseline data so folks don’t have to run both phases every time, but need to take variability between users’ setups while evaluating regression. That’s why doing both runs in the same environment feels safest for now—let’s revisit adding shared baselines down the line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ absolutely a future problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, PTAL


This guide demonstrates how to run regression testing against the Gateway API inference extension. Benchmarks are conducted using the [Latency Profile Generator](https://github.com/AI-Hypercomputer/inference-benchmark) (LPG) to simulate traffic and collect detailed metrics.

## Prerequisites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplicate from https://gateway-api-inference-extension.sigs.k8s.io/performance/benchmark/.

Can we restructure this doc to focus more on defining the regression test cases, and then refer to the benchmark doc on how to perform the benchmarks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. PTAL at the updated doc.

- **Dataset:** `Infinity-Instruct_conversations.json` generated from the provided Python script `./tools/benchmark/import-datasets`.
* * `Infinity-Instruct_conversations.json` is the [huggingface dataset](https://huggingface.co/datasets/BAAI/Infinity-Instruct) converted to prompt → response style conversation json that can be consumed by the benchmarking script.
- **Model:** [Llama 3 (8B)](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- **LoRA:** 15 adapters using `nvidia/llama-3.1-nemoguard-8b-topic-control` (rank 8, <1% of base model size) (all adapters are *critical*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear how to configure 15 adapters with the single suggested adapter. Can you be explicit (I think you can do this by giving alias in vllm startup config, pls add an example here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see config/manifests/regression-testing/vllm/multi-lora-deployment.yaml


Use the following configurations to update `./config/manifests/benchmark/benchmark.yaml` for regression testing. These configurations are tailored to evaluate performance shifts before and after code changes. They assume NVIDIA H100 GPUs (80 GB)—adjust them as needed for different hardware, backend counts, or datasets.

### Example 1: Single Workload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added vllm and inferencemodel deployment for multi lora test case and also added benchmark yamls for the two test cases

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 29, 2025
- name: FILE_PREFIX
value: benchmark
- name: PROMPT_DATASET_FILE
value: Infinity-Instruct_conversations.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the file that's generated by the import_datasets.py? We will need to build this file to the lpg image in order to use it, right? Can we provide an lpg image with the datasets built-in?

Copy link
Contributor Author

@kaushikmitr kaushikmitr Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but both these datasets require users to sign in to hf to accept an agreement. So I did not put it in the public image. We can have it internally though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, then please replace the image with a placeholder <lpg_image>, and explain how to build a new image with this dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, it seems easier is to make update to the LPG script and point to that here. Pushed my changes to import datasets and create a docker image in the LPG repo. Please try it out.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 30, 2025
update traffic split setup

update traffic split setup

update requirement

update regressing testig doc

consolidate performance docs

add newline

add example yamls for multi lora deployment and regression lgp testing

fix qps range

fix typo

fix typo

fix typo

fix typo

fix typo

fix typo

fix typo

fix broken link

add instructions to build lpg image

update benchmark.yaml

update lpg yamls

update readme

update regfression testing markdown to refine docker image creating for LPG

update regression yamls

refine regression doc
app: benchmark-tool
spec:
containers:
# Build image from this source https://github.com/AI-Hypercomputer/inference-benchmark/blob/1c92df607751a7ddb04e2152ed7f6aaf85bd9ca7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not from main?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found two issues:

  1. Not sure why but this commit doesn't exist in any branch. This should be fixed.
  2. The Dockerfile needs to be updated to copy the new dataset file to the image.

why not from main?

I don't recommend main because we should pin to a version we currently support and main can change.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants