Skip to content

Docs: Updates Benchmark Guide #789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 7, 2025

Conversation

danehans
Copy link
Contributor

@danehans danehans commented May 6, 2025

  • config/manifests/benchmark/benchmark.yaml: Updates the target port to 80 since the guide only sets the target-ip and most gateways use port 80 by default.
  • site-src/performance/benchmark/index.md: Provides additional explanation in steps to help guide users. Adds a note to use the GPU-based vLLM deployment for benchmarking. Updates the benchmark_id value to match the labels in tools/benchmark/benchmark.ipynb.
  • tools/benchmark/benchmark.ipynb: Sets the default run id and removes undefined INTERACTIVE_PLOT variable.

Signed-off-by: Daneyon Hansen <[email protected]>
@danehans danehans requested a review from liu-cong May 6, 2025 23:51
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2025
Copy link

netlify bot commented May 6, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 188d82e
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/681aa07004341800084dcc4f
😎 Deploy Preview https://deploy-preview-789--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 6, 2025
@ahg-g
Copy link
Contributor

ahg-g commented May 7, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2025
@k8s-ci-robot k8s-ci-robot merged commit e7944d1 into kubernetes-sigs:main May 7, 2025
8 checks passed
Copy link
Contributor

@liu-cong liu-cong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold if you want to address the nits.

Thank you for the update, it looks much cleaner!

```bash
git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension
cd gateway-api-inference-extension
```

1. Get the target IP. Examples below show how to get the IP of a gateway or a LoadBalancer k8s service.
1. Get the target IP. The examples below shows how to get the IP of a gateway or a k8s service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Get the target IP. The examples below shows how to get the IP of a gateway or a k8s service.
1. Get the target IP. The example below shows how to get the IP of a gateway or a k8s service.


```bash
kubectl scale --replicas=8 -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml
kubectl scale deployment vllm-llama3-8b-instruct --replicas=8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I suggest changing replicas to 6 as the example in the end uses 6 replicas, and the new regression test PR also uses 6

#755

@@ -37,7 +37,7 @@ spec:
- name: BACKEND
value: vllm
- name: PORT
value: "8081"
value: "80"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 8081 was the port back then when we had envoy patches.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 7, 2025
@liu-cong
Copy link
Contributor

liu-cong commented May 7, 2025

/lgtm

rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request May 13, 2025
nayihz pushed a commit to nayihz/gateway-api-inference-extension that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants