From 2d52334a6c2bb805bf6e9c3883b9a6c0b520a534 Mon Sep 17 00:00:00 2001 From: Vaishnavi Hire Date: Thu, 30 Oct 2025 13:14:02 -0400 Subject: [PATCH] docs: Add Llama Stack Operator docs Add documentation for llama-stack-k8s-operator under kubernetes deployment guide. Signed-off-by: Vaishnavi Hire --- docs/docs/deploying/kubernetes_deployment.mdx | 211 +++++++++++------- 1 file changed, 136 insertions(+), 75 deletions(-) diff --git a/docs/docs/deploying/kubernetes_deployment.mdx b/docs/docs/deploying/kubernetes_deployment.mdx index 8ed1e2756a..48d08f0db0 100644 --- a/docs/docs/deploying/kubernetes_deployment.mdx +++ b/docs/docs/deploying/kubernetes_deployment.mdx @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem'; # Kubernetes Deployment Guide -Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers both local development with Kind and production deployment on AWS EKS. +Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers deployment using the Kubernetes operator to manage the Llama Stack server with Kind. The vLLM inference server is deployed manually. ## Prerequisites @@ -110,115 +110,176 @@ spec: EOF ``` -### Step 3: Configure Llama Stack +### Step 3: Install Kubernetes Operator -Update your run configuration: +Install the Llama Stack Kubernetes operator to manage Llama Stack deployments: -```yaml -providers: - inference: - - provider_id: vllm - provider_type: remote::vllm - config: - url: http://vllm-server.default.svc.cluster.local:8000/v1 - max_tokens: 4096 - api_token: fake +```bash +# Install from the latest main branch +kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/main/release/operator.yaml + +# Or install a specific version (e.g., v0.4.0) +# kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/v0.4.0/release/operator.yaml ``` -Build container image: +Verify the operator is running: ```bash -tmp_dir=$(mktemp -d) && cat >$tmp_dir/Containerfile.llama-stack-run-k8s <-service`): + +```bash +# List services to find the service name +kubectl get services | grep llamastack + +# Port forward and test (replace SERVICE_NAME with the actual service name) +kubectl port-forward service/llamastack-vllm-service 8321:8321 +``` + +In another terminal, test the deployment: + +```bash +llama-stack-client --endpoint http://localhost:8321 inference chat-completion --message "hello, what model are you?" ``` ## Troubleshooting -**Check pod status:** +### vLLM Server Issues + +**Check vLLM pod status:** ```bash kubectl get pods -l app.kubernetes.io/name=vllm kubectl logs -l app.kubernetes.io/name=vllm ``` -**Test service connectivity:** +**Test vLLM service connectivity:** ```bash kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://vllm-server:8000/v1/models ``` +### Llama Stack Server Issues + +**Check LlamaStackDistribution status:** +```bash +# Get detailed status +kubectl describe llamastackdistribution llamastack-vllm + +# Check for events +kubectl get events --sort-by='.lastTimestamp' | grep llamastack-vllm +``` + +**Check operator-managed pods:** +```bash +# List all pods managed by the operator +kubectl get pods -l app.kubernetes.io/name=llama-stack + +# Check pod logs (replace POD_NAME with actual pod name) +kubectl logs -l app.kubernetes.io/name=llama-stack +``` + +**Check operator status:** +```bash +# Verify the operator is running +kubectl get pods -n llama-stack-operator-system + +# Check operator logs if issues persist +kubectl logs -n llama-stack-operator-system -l control-plane=controller-manager +``` + +**Verify service connectivity:** +```bash +# Get the service endpoint +kubectl get svc llamastack-vllm-service + +# Test connectivity from within the cluster +kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://llamastack-vllm-service:8321/health +``` + ## Related Resources - **[Deployment Overview](/docs/deploying/)** - Overview of deployment options - **[Distributions](/docs/distributions)** - Understanding Llama Stack distributions - **[Configuration](/docs/distributions/configuration)** - Detailed configuration options +- **[LlamaStack Operator](https://github.com/llamastack/llama-stack-k8s-operator)** - Overview of llama-stack kubernetes operator +- **[LlamaStackDistribution](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md)** - API Spec of the llama-stack operator Custom Resource.