Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 91 additions & 82 deletions site-src/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,84 @@ Tooling:
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
```

### Install the Gateway

1. Requirements
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) Installed.

Choose one of the following options to install Gateway.

=== "GKE"

1. Enable the Google Kubernetes Engine API, Compute Engine API, the Network Services API and configure proxy-only subnets when necessary.

See [Deploy Inference Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploy-gke-inference-gateway) for detailed instructions.

=== "Istio"

Please note that this feature is currently in an experimental phase and is not intended for production use.
The implementation and user experience are subject to changes as we continue to iterate on this project.

1. Install Istio

```
TAG=$(curl https://storage.googleapis.com/istio-build/dev/1.28-dev)
# on Linux
wget https://storage.googleapis.com/istio-build/dev/$TAG/istioctl-$TAG-linux-amd64.tar.gz
tar -xvf istioctl-$TAG-linux-amd64.tar.gz
# on macOS
wget https://storage.googleapis.com/istio-build/dev/$TAG/istioctl-$TAG-osx.tar.gz
tar -xvf istioctl-$TAG-osx.tar.gz
# on Windows
wget https://storage.googleapis.com/istio-build/dev/$TAG/istioctl-$TAG-win.zip
unzip istioctl-$TAG-win.zip

./istioctl install --set tag=$TAG --set hub=gcr.io/istio-testing --set values.pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true
```

=== "Kgateway"

[Kgateway](https://kgateway.dev/) added Inference Gateway support as a **technical preview** in the
[v2.0.0 release](https://github.com/kgateway-dev/kgateway/releases/tag/v2.0.0). InferencePool v1.0.0 is currently supported in the latest [rolling release](https://github.com/kgateway-dev/kgateway/releases/tag/v2.1.0-main), which includes the latest changes but may be unstable until the [v2.1.0 release](https://github.com/kgateway-dev/kgateway/milestone/58) is published.

1. Requirements

- [Helm](https://helm.sh/docs/intro/install/) installed.

2. Set the Kgateway version and install the Kgateway CRDs.

```bash
KGTW_VERSION=v2.1.0-main
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
```

3. Install Kgateway

```bash
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
```

=== "Agentgateway"

[Agentgateway](https://agentgateway.dev/) is a purpose-built proxy designed for AI workloads, and comes with native support for Inference Gateway. Agentgateway integrates with [Kgateway](https://kgateway.dev/) as it's control plane. InferencePool v1.0.0 is currently supported in the latest [rolling release](https://github.com/kgateway-dev/kgateway/releases/tag/v2.1.0-main), which includes the latest changes but may be unstable until the [v2.1.0 release](https://github.com/kgateway-dev/kgateway/milestone/58) is published.

1. Requirements

- [Helm](https://helm.sh/docs/intro/install/) installed.

2. Set the Kgateway version and install the Kgateway CRDs.

```bash
KGTW_VERSION=v2.1.0-main
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
```

3. Install Kgateway

```bash
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true --set agentGateway.enabled=true
```

### Deploy the InferencePool and Endpoint Picker Extension

Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label `app: vllm-llama3-8b-instruct` and listening on port 8000. The Helm install command automatically installs the endpoint-picker, inferencepool along with provider specific resources.
Expand Down Expand Up @@ -135,17 +213,13 @@ Tooling:
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
```

### Deploy an Inference Gateway
### Deploy the Inference Gateway

Choose one of the following options to deploy an Inference Gateway.

=== "GKE"

1. Enable the Google Kubernetes Engine API, Compute Engine API, the Network Services API and configure proxy-only subnets when necessary.
See [Deploy Inference Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploy-gke-inference-gateway)
for detailed instructions.

2. Deploy Inference Gateway:
1. Deploy the Inference Gateway:

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gateway.yaml
Expand All @@ -158,45 +232,21 @@ Tooling:
NAME CLASS ADDRESS PROGRAMMED AGE
inference-gateway inference-gateway <MY_ADDRESS> True 22s
```
3. Deploy the HTTPRoute
2. Deploy the HTTPRoute

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml
```

4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
3. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:

```bash
kubectl get httproute llm-route -o yaml
```

=== "Istio"

Please note that this feature is currently in an experimental phase and is not intended for production use.
The implementation and user experience are subject to changes as we continue to iterate on this project.

1. Requirements

- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.

2. Install Istio

```
TAG=$(curl https://storage.googleapis.com/istio-build/dev/1.28-dev)
# on Linux
wget https://storage.googleapis.com/istio-build/dev/$TAG/istioctl-$TAG-linux-amd64.tar.gz
tar -xvf istioctl-$TAG-linux-amd64.tar.gz
# on macOS
wget https://storage.googleapis.com/istio-build/dev/$TAG/istioctl-$TAG-osx.tar.gz
tar -xvf istioctl-$TAG-osx.tar.gz
# on Windows
wget https://storage.googleapis.com/istio-build/dev/$TAG/istioctl-$TAG-win.zip
unzip istioctl-$TAG-win.zip

./istioctl install --set tag=$TAG --set hub=gcr.io/istio-testing --set values.pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true
```

3. Deploy Gateway
1. Deploy the Inference Gateway

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/gateway.yaml
Expand All @@ -209,42 +259,21 @@ Tooling:
inference-gateway inference-gateway <MY_ADDRESS> True 22s
```

4. Deploy the HTTPRoute
2. Deploy the HTTPRoute

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/httproute.yaml
```

5. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
3. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:

```bash
kubectl get httproute llm-route -o yaml
```

=== "Kgateway"

[Kgateway](https://kgateway.dev/) added Inference Gateway support as a **technical preview** in the
[v2.0.0 release](https://github.com/kgateway-dev/kgateway/releases/tag/v2.0.0). InferencePool v1.0.0 is currently supported in the latest [rolling release](https://github.com/kgateway-dev/kgateway/releases/tag/v2.1.0-main), which includes the latest changes but may be unstable until the [v2.1.0 release](https://github.com/kgateway-dev/kgateway/milestone/58) is published.

1. Requirements

- [Helm](https://helm.sh/docs/intro/install/) installed.
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.

2. Set the Kgateway version and install the Kgateway CRDs.

```bash
KGTW_VERSION=v2.1.0-main
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
```

3. Install Kgateway

```bash
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
```

4. Deploy the Gateway
1. Deploy the Inference Gateway

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml
Expand All @@ -257,41 +286,21 @@ Tooling:
inference-gateway kgateway <MY_ADDRESS> True 22s
```

5. Deploy the HTTPRoute
2. Deploy the HTTPRoute

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/httproute.yaml
```

6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
3. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:

```bash
kubectl get httproute llm-route -o yaml
```

=== "Agentgateway"

[Agentgateway](https://agentgateway.dev/) is a purpose-built proxy designed for AI workloads, and comes with native support for Inference Gateway. Agentgateway integrates with [Kgateway](https://kgateway.dev/) as it's control plane. InferencePool v1.0.0 is currently supported in the latest [rolling release](https://github.com/kgateway-dev/kgateway/releases/tag/v2.1.0-main), which includes the latest changes but may be unstable until the [v2.1.0 release](https://github.com/kgateway-dev/kgateway/milestone/58) is published.

1. Requirements

- [Helm](https://helm.sh/docs/intro/install/) installed.
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.

2. Set the Kgateway version and install the Kgateway CRDs.

```bash
KGTW_VERSION=v2.1.0-main
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
```

3. Install Kgateway

```bash
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true --set agentGateway.enabled=true
```

4. Deploy the Gateway
1. Deploy the Inference Gateway

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/gateway.yaml
Expand All @@ -304,13 +313,13 @@ Tooling:
inference-gateway agentgateway <MY_ADDRESS> True 22s
```

5. Deploy the HTTPRoute
2. Deploy the HTTPRoute

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/httproute.yaml
```

6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
3. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:

```bash
kubectl get httproute llm-route -o yaml
Expand Down