Skip to content

Conversation

@brokedba
Copy link
Contributor

@brokedba brokedba commented Nov 1, 2025

📋 Summary

This PR adds a complete Nebius MK8s deployment tutorial for the vLLM Production Stack - extending support to another modern Kubernetes cloud provider with GPU acceleration.

Contributed on behalf of CloudThrill — Cloud infrastructure specialists focused on production-grade AI/ML deployments.


🎯 What This Adds

✅ New tutorial path structure

tutorials/
├── gcp/           # ✅ Existing
├── azure/         # ✅ Existing  
└── eks/           # ✅ Existing  
└── nebius/        # 🆕 This PR - Complete Nebius tutorial

🚀 Core Features

  • Production-ready vLLM serving on Nebius MK8s
  • Tested with L40s GPU nodes
  • Terraform-driven cluster provisioning + Helm deployment
  • Nginx Ingress with TLS (cert-manager + Let's Encrypt)
  • Prometheus + Grafana observability
  • vLLM dashboard and service monitoring auto configured in Grafana
  • Hugging Face token handling

🏗️ Technical Highlights

Component Description
Nebius MK8s Managed Kubernetes control plane
GPU Node Group L40s GPU compute for inference
Networking VPC and CIDRs with sane defaults
Ingress HTTPS routing and certificate automation
Observability Prometheus + Grafana dashboards
vLLM Runtime Helm-based deployment from production stack chart

Additional Notes:

  • GPU autoscaling settings included
  • Nebius CLI installation instructions added
  • CLI commands added for retrieving ingress URLs

✅ Why This Matters

  • Expands vLLM stack coverage to Nebius, a fast-growing Kubernetes cloud
  • Offers a cost-efficient and GPU-capable option for running vLLM in production
  • Provides a turnkey template for startup-focused private inference deployments

📚 Included Documentation

  • Prerequisites & Nebius CLI setup
  • terraform.tfvars example
  • Deployment + cleanup commands
  • HTTPS access + Grafana access instructions
  • Troubleshooting (fix for lestencrypt certification rate limit)
  • Testing inference with curl

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @brokedba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the vLLM Production Stack's cloud provider coverage by introducing a production-ready Terraform stack for Nebius MK8s. It enables users to deploy GPU-accelerated vLLM inference environments on Nebius, complete with automated infrastructure provisioning, secure ingress, and comprehensive monitoring, offering a turnkey solution for cost-efficient AI/ML deployments.

Highlights

  • Nebius MK8s Integration: Introduces a complete Terraform-driven stack for deploying the vLLM Production Stack on Nebius Managed Kubernetes (MK8s), expanding cloud provider support.
  • GPU-Accelerated Inference: Supports GPU acceleration, tested with L40s GPU nodes, and includes auto-scaling capabilities for efficient resource utilization in AI/ML inference.
  • Comprehensive Infrastructure as Code: Provides a full Terraform setup for provisioning Nebius VPC, subnets, MK8s clusters, and managed node groups (both CPU and GPU).
  • Production-Ready Add-ons: Integrates essential services like Nginx Ingress with TLS (Let's Encrypt via cert-manager) and a robust observability stack (Prometheus and Grafana with pre-configured vLLM dashboards).
  • Detailed Deployment Tutorial: Includes a new README.md with step-by-step instructions, prerequisites, configuration options, quick start, testing, and troubleshooting guides for easy adoption.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive Terraform stack for deploying vLLM on Nebius MK8s, which is a great addition. The code is well-structured and covers infrastructure provisioning, Kubernetes add-ons, and the vLLM application stack. My review focuses on several critical issues that could prevent the stack from deploying correctly, such as incorrect provider configurations and hardcoded values that should be variables. I've also pointed out several areas where the documentation and comments are misleading due to copy-pasting from other cloud provider examples (AWS, EKS, AKS), which could cause significant confusion for users. Finally, there are some suggestions for code cleanup and modernization, like removing commented-out code and replacing deprecated data sources. Addressing these points will significantly improve the robustness, maintainability, and user-friendliness of this new Nebius tutorial.

@brokedba brokedba changed the title [Feat] Add production-ready vLLM Nebius MK8s terraform stack [Feat] Add production-ready vLLM Nebius MK8s terraform tutorial Nov 1, 2025
@brokedba brokedba force-pushed the nebiusk8s-terraform-stack-tuto branch from dc22dde to 36e8f53 Compare November 1, 2025 09:22
Copy link
Collaborator

@zerofishnoodles zerofishnoodles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would you be able to show a demo for our next community meeting?

@brokedba
Copy link
Contributor Author

@zerofishnoodles Absolutely. looking forward to it.

Copy link
Collaborator

@zerofishnoodles zerofishnoodles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zerofishnoodles
Copy link
Collaborator

Hi, can you update the branch?

Includes:
- GPU autoscaling support
- Secure ingress + TLS
- Prometheus + Grafana monitoring
- Built-in vLLM Grafana dashboards
- Terraform + Helm integration

Signed-off-by: Kosseila (CloudThrill) <[email protected]>
@brokedba brokedba force-pushed the nebiusk8s-terraform-stack-tuto branch from 36e8f53 to f4cbeb3 Compare November 19, 2025 22:24
@brokedba
Copy link
Contributor Author

brokedba commented Nov 19, 2025

Just did . it should be good. No conflicts with base branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants