[Feat] Add production-ready vLLM Nebius MK8s terraform tutorial #748

brokedba · 2025-11-01T08:37:08Z

📋 Summary

This PR adds a complete Nebius MK8s deployment tutorial for the vLLM Production Stack - extending support to another modern Kubernetes cloud provider with GPU acceleration.

Contributed on behalf of CloudThrill — Cloud infrastructure specialists focused on production-grade AI/ML deployments.

🎯 What This Adds

✅ New tutorial path structure

tutorials/
├── gcp/           # ✅ Existing
├── azure/         # ✅ Existing  
└── eks/           # ✅ Existing  
└── nebius/        # 🆕 This PR - Complete Nebius tutorial

🚀 Core Features

Production-ready vLLM serving on Nebius MK8s
Tested with L40s GPU nodes
Terraform-driven cluster provisioning + Helm deployment
Nginx Ingress with TLS (cert-manager + Let's Encrypt)
Prometheus + Grafana observability
vLLM dashboard and service monitoring auto configured in Grafana
Hugging Face token handling

🏗️ Technical Highlights

Component	Description
Nebius MK8s	Managed Kubernetes control plane
GPU Node Group	L40s GPU compute for inference
Networking	VPC and CIDRs with sane defaults
Ingress	HTTPS routing and certificate automation
Observability	Prometheus + Grafana dashboards
vLLM Runtime	Helm-based deployment from production stack chart

Additional Notes:

GPU autoscaling settings included
Nebius CLI installation instructions added
CLI commands added for retrieving ingress URLs

✅ Why This Matters

Expands vLLM stack coverage to Nebius, a fast-growing Kubernetes cloud
Offers a cost-efficient and GPU-capable option for running vLLM in production
Provides a turnkey template for startup-focused private inference deployments

📚 Included Documentation

Prerequisites & Nebius CLI setup
terraform.tfvars example
Deployment + cleanup commands
HTTPS access + Grafana access instructions
Troubleshooting (fix for lestencrypt certification rate limit)
Testing inference with curl

gemini-code-assist · 2025-11-01T08:37:38Z

Summary of Changes

Hello @brokedba, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the vLLM Production Stack's cloud provider coverage by introducing a production-ready Terraform stack for Nebius MK8s. It enables users to deploy GPU-accelerated vLLM inference environments on Nebius, complete with automated infrastructure provisioning, secure ingress, and comprehensive monitoring, offering a turnkey solution for cost-efficient AI/ML deployments.

Highlights

Nebius MK8s Integration: Introduces a complete Terraform-driven stack for deploying the vLLM Production Stack on Nebius Managed Kubernetes (MK8s), expanding cloud provider support.
GPU-Accelerated Inference: Supports GPU acceleration, tested with L40s GPU nodes, and includes auto-scaling capabilities for efficient resource utilization in AI/ML inference.
Comprehensive Infrastructure as Code: Provides a full Terraform setup for provisioning Nebius VPC, subnets, MK8s clusters, and managed node groups (both CPU and GPU).
Production-Ready Add-ons: Integrates essential services like Nginx Ingress with TLS (Let's Encrypt via cert-manager) and a robust observability stack (Prometheus and Grafana with pre-configured vLLM dashboards).
Detailed Deployment Tutorial: Includes a new README.md with step-by-step instructions, prerequisites, configuration options, quick start, testing, and troubleshooting guides for easy adoption.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a comprehensive Terraform stack for deploying vLLM on Nebius MK8s, which is a great addition. The code is well-structured and covers infrastructure provisioning, Kubernetes add-ons, and the vLLM application stack. My review focuses on several critical issues that could prevent the stack from deploying correctly, such as incorrect provider configurations and hardcoded values that should be variables. I've also pointed out several areas where the documentation and comments are misleading due to copy-pasting from other cloud provider examples (AWS, EKS, AKS), which could cause significant confusion for users. Finally, there are some suggestions for code cleanup and modernization, like removing commented-out code and replacing deprecated data sources. Addressing these points will significantly improve the robustness, maintainability, and user-friendliness of this new Nebius tutorial.

tutorials/terraform/nebius/main.tf

tutorials/terraform/nebius/output.tf

tutorials/terraform/nebius/provider.tf

tutorials/terraform/nebius/README.md

tutorials/terraform/nebius/config/llm-stack/helm/gpu/gpu-operator-values.yaml

tutorials/terraform/nebius/main.tf

tutorials/terraform/nebius/cluster-tools.tf

tutorials/terraform/nebius/config/llm-stack/helm/gpu/gpu-tinyllama-light-ingress-nebius.tpl

tutorials/terraform/nebius/vllm-production-stack.tf

zerofishnoodles

LGTM, would you be able to show a demo for our next community meeting?

brokedba · 2025-11-11T22:37:48Z

@zerofishnoodles Absolutely. looking forward to it.

zerofishnoodles

LGTM

zerofishnoodles · 2025-11-19T21:48:46Z

Hi, can you update the branch?

Includes: - GPU autoscaling support - Secure ingress + TLS - Prometheus + Grafana monitoring - Built-in vLLM Grafana dashboards - Terraform + Helm integration Signed-off-by: Kosseila (CloudThrill) <[email protected]>

brokedba · 2025-11-19T22:28:08Z

Just did . it should be good. No conflicts with base branch.

gemini-code-assist bot reviewed Nov 1, 2025

View reviewed changes

brokedba changed the title ~~[Feat] Add production-ready vLLM Nebius MK8s terraform stack~~ [Feat] Add production-ready vLLM Nebius MK8s terraform tutorial Nov 1, 2025

brokedba force-pushed the nebiusk8s-terraform-stack-tuto branch from dc22dde to 36e8f53 Compare November 1, 2025 09:22

zerofishnoodles approved these changes Nov 11, 2025

View reviewed changes

zerofishnoodles approved these changes Nov 19, 2025

View reviewed changes

Add production-ready vLLM Nebius MK8s terraform stack

f4cbeb3

Includes: - GPU autoscaling support - Secure ingress + TLS - Prometheus + Grafana monitoring - Built-in vLLM Grafana dashboards - Terraform + Helm integration Signed-off-by: Kosseila (CloudThrill) <[email protected]>

brokedba force-pushed the nebiusk8s-terraform-stack-tuto branch from 36e8f53 to f4cbeb3 Compare November 19, 2025 22:24

[Feat] Add production-ready vLLM Nebius MK8s terraform tutorial #748

Are you sure you want to change the base?

[Feat] Add production-ready vLLM Nebius MK8s terraform tutorial #748

Uh oh!

Conversation

brokedba commented Nov 1, 2025

📋 Summary

🎯 What This Adds

✅ New tutorial path structure

🚀 Core Features

🏗️ Technical Highlights

✅ Why This Matters

📚 Included Documentation

Uh oh!

gemini-code-assist bot commented Nov 1, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zerofishnoodles left a comment

Choose a reason for hiding this comment

Uh oh!

brokedba commented Nov 11, 2025

Uh oh!

zerofishnoodles left a comment

Choose a reason for hiding this comment

Uh oh!

zerofishnoodles commented Nov 19, 2025

Uh oh!

brokedba commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brokedba commented Nov 19, 2025 •

edited

Loading