-
Notifications
You must be signed in to change notification settings - Fork 76
add regression testing docs #755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kaushikmitr The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @kaushikmitr. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
mkdocs.yml
Outdated
@@ -20,6 +20,9 @@ theme: | |||
primary: custom | |||
custom_dir: site-src/overrides | |||
edit_uri: edit/main/site-src/ | |||
extra: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC, do we need this for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed, removed it
|
||
### Example 1: Single Workload | ||
|
||
- **Dataset:** [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is LPG config, can we make these into seperate manifests in the benchmark
folder?
|
||
Perform the benchmarks in two phases: | ||
|
||
- **Before applying changes:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this means that we need to run the benchmark twice, once for before any algo changes, and once after. Can we be more explicit about the wording there?
Also, out of scope of this PR, but we may want to have some baseline data so the user doesn't need to run the benchmark twice, hopefully saving them some time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be great to surface some baseline data so folks don’t have to run both phases every time, but need to take variability between users’ setups while evaluating regression. That’s why doing both runs in the same environment feels safest for now—let’s revisit adding shared baselines down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ absolutely a future problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different from https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/benchmark/benchmark.ipynb ? Can we consolidate them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, PTAL
|
||
This guide demonstrates how to run regression testing against the Gateway API inference extension. Benchmarks are conducted using the [Latency Profile Generator](https://github.com/AI-Hypercomputer/inference-benchmark) (LPG) to simulate traffic and collect detailed metrics. | ||
|
||
## Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of duplicate from https://gateway-api-inference-extension.sigs.k8s.io/performance/benchmark/.
Can we restructure this doc to focus more on defining the regression test cases, and then refer to the benchmark doc on how to perform the benchmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. PTAL at the updated doc.
- **Dataset:** `Infinity-Instruct_conversations.json` generated from the provided Python script `./tools/benchmark/import-datasets`. | ||
* * `Infinity-Instruct_conversations.json` is the [huggingface dataset](https://huggingface.co/datasets/BAAI/Infinity-Instruct) converted to prompt → response style conversation json that can be consumed by the benchmarking script. | ||
- **Model:** [Llama 3 (8B)](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | ||
- **LoRA:** 15 adapters using `nvidia/llama-3.1-nemoguard-8b-topic-control` (rank 8, <1% of base model size) (all adapters are *critical*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear how to configure 15 adapters with the single suggested adapter. Can you be explicit (I think you can do this by giving alias in vllm startup config, pls add an example here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see config/manifests/regression-testing/vllm/multi-lora-deployment.yaml
|
||
Use the following configurations to update `./config/manifests/benchmark/benchmark.yaml` for regression testing. These configurations are tailored to evaluate performance shifts before and after code changes. They assume NVIDIA H100 GPUs (80 GB)—adjust them as needed for different hardware, backend counts, or datasets. | ||
|
||
### Example 1: Single Workload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding example LPG yamls for these two cases, similar to https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/benchmark/benchmark.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added vllm and inferencemodel deployment for multi lora test case and also added benchmark yamls for the two test cases
- name: FILE_PREFIX | ||
value: benchmark | ||
- name: PROMPT_DATASET_FILE | ||
value: Infinity-Instruct_conversations.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the file that's generated by the import_datasets.py
? We will need to build this file to the lpg image in order to use it, right? Can we provide an lpg image with the datasets built-in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but both these datasets require users to sign in to hf to accept an agreement. So I did not put it in the public image. We can have it internally though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it, then please replace the image with a placeholder <lpg_image>, and explain how to build a new image with this dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, it seems easier is to make update to the LPG script and point to that here. Pushed my changes to import datasets and create a docker image in the LPG repo. Please try it out.
update traffic split setup update traffic split setup update requirement update regressing testig doc consolidate performance docs add newline add example yamls for multi lora deployment and regression lgp testing fix qps range fix typo fix typo fix typo fix typo fix typo fix typo fix typo fix broken link add instructions to build lpg image update benchmark.yaml update lpg yamls update readme update regfression testing markdown to refine docker image creating for LPG update regression yamls refine regression doc
app: benchmark-tool | ||
spec: | ||
containers: | ||
# Build image from this source https://github.com/AI-Hypercomputer/inference-benchmark/blob/1c92df607751a7ddb04e2152ed7f6aaf85bd9ca7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not from main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found two issues:
- Not sure why but this commit doesn't exist in any branch. This should be fixed.
- The Dockerfile needs to be updated to copy the new dataset file to the image.
why not from main?
I don't recommend main because we should pin to a version we currently support and main can change.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This pull request introduces multiple changes aimed at setting up regression testing for inference models and updating related documentation. Key changes include the addition of Kubernetes manifests for deploying and testing inference models, as well as updates to the documentation to reflect these new capabilities.
Regression Testing Setup:
InferenceModel
configurations for 15 adapters and one base model (meta-llama/Llama-3.1-8B-Instruct
) inconfig/manifests/regression-testing/inferencemodel.yaml
. These models are critical and reference thevllm-llama3-8b-instruct
pool.config/manifests/regression-testing/multi-lora-regression.yaml
. This tool benchmarks multiple LoRA adapters with specified traffic splits and request rates.config/manifests/regression-testing/single-workload-regression.yaml
. This focuses on benchmarking the base model with higher request rates.vllm-llama3-8b-instruct
model server inconfig/manifests/regression-testing/vllm/multi-lora-deployment.yaml
. This includes configurations for LoRA adapters, readiness and liveness probes, and resource limits for GPUs.Documentation Updates:
mkdocs.yml
.site-src/performance/benchmark/index.md
to include instructions for updating benchmark IDs withinference-extension
andk8s-svc
in the last notebook cell.