-
Notifications
You must be signed in to change notification settings - Fork 76
Docs: Updates Benchmark Guide #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Daneyon Hansen <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danehans The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold if you want to address the nits.
Thank you for the update, it looks much cleaner!
```bash | ||
git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension | ||
cd gateway-api-inference-extension | ||
``` | ||
|
||
1. Get the target IP. Examples below show how to get the IP of a gateway or a LoadBalancer k8s service. | ||
1. Get the target IP. The examples below shows how to get the IP of a gateway or a k8s service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Get the target IP. The examples below shows how to get the IP of a gateway or a k8s service. | |
1. Get the target IP. The example below shows how to get the IP of a gateway or a k8s service. |
|
||
```bash | ||
kubectl scale --replicas=8 -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml | ||
kubectl scale deployment vllm-llama3-8b-instruct --replicas=8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I suggest changing replicas to 6 as the example in the end uses 6 replicas, and the new regression test PR also uses 6
@@ -37,7 +37,7 @@ spec: | |||
- name: BACKEND | |||
value: vllm | |||
- name: PORT | |||
value: "8081" | |||
value: "80" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! 8081 was the port back then when we had envoy patches.
/lgtm |
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
config/manifests/benchmark/benchmark.yaml
: Updates the target port to80
since the guide only sets thetarget-ip
and most gateways use port 80 by default.site-src/performance/benchmark/index.md
: Provides additional explanation in steps to help guide users. Adds a note to use the GPU-based vLLM deployment for benchmarking. Updates thebenchmark_id
value to match the labels intools/benchmark/benchmark.ipynb
.tools/benchmark/benchmark.ipynb
: Sets the default run id and removes undefinedINTERACTIVE_PLOT
variable.