GitOps repository for the HP-Fury machine.
To get network access to the machine:
- Login to tailscale.com using your Xebia e-mail address and install the Tailscale VPN client. Ping Julian to have your Tailscale account approved.
- Once your account is approved, connect to the Tailscale VPN and check that you can reach the machine:
ping hp-fury.tail6720f8.ts.net
.
To setup access for Kubernetes:
- Install
kubectl
on your machine:brew install kubernetes-cli
. - Optional, but highly recommended:
- Install K9s for a interactive CLI interface for K8s:
brew install k9s
. - Install kubectx to easily switch between K8s contexts and namespaces:
brew install kubectx
. - Install kustomize to manually deploy applications to the cluster using Helm through kustomize
- Install helm to use kustomize to build helm charts
- Install K9s for a interactive CLI interface for K8s:
- Look for de
HP-Fury - kube-config
secret in the Xebia Data 1Password and copy thekube-config.yaml
file to~/.kube/config
on your machine (or merge it with your existing config if you already have one). - Verify that you can connect to the cluster by running
kubectl cluster-info
. This should return details from the cluster athp-fury.tail6720f8.ts.net
. If you see another cluster, check you have the right context selected forkubectl
.
If you need SSH access, please ping Julian to setup an account etc. on the machine.
When developing on this machine, please adhere to the following guidelines:
- Don't use the default namespace.
- Create separate namespace(s) for your application(s). This can either be:
- A personal namespace in your name (e.g.
jrderuiter
). - A namespace named after the application (e.g.
argocd
). - A namespace grouping multiple applications (e.g.
monitoring
).
- Don't put any super sensitive data on the machine.
- Don't store any secrets in Git. (Use sealed-secrets or the secret generator instead, as outlined below).
- Ping our Slack channel
#hp-zp-g5
whenever you plan to use significant resources (such as the GPUs). This to both check whether they're available and to notify others. - In general: be mindful of others!
To add an application or other Kubernetes resources:
- Create a branch on this repo (e.g.
feature/add-langfuse
). - Create a folder under
applications
(e.g.applications/<name>
) for your resources.
If you're using kustomize (recommended), please use the following folder structure:
- patches (directory for kustomize patches, if needed)
- resources (directory for your manifest files)
- namespace.yml
- ...
- kustomization.yml
For Helm-based applications, you can use Helm within Kustomize (see applications/sealed-secrets
for an example).
During development, you can deploy your Kubernetes resources manually using kubectl, e.g.:
# For plain Kustomize
kubectl apply -k applications/<name>
or
# For Kustomize with Helm
kustomize build --enable-helm applications/<name> | kubectl apply -f -
For longer running applications, you can deploy your resources via ArgoCD. The main advantage of this approach is that ArgoCD will monitor your application and ensure that it remains in sync with the manifests defined in this repository. Additionally, it will also monitor the application health and warn if there are any problems.
To deploy your application via ArgoCD:
- On your feature branch, define your application resources under
applications/<name>
as outlined above. - On the main branch, create an
application
manifest for your application undercluster/applications
, replacing<name>
,<namespace>
and<branch>
with the correct values:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
# Name of your application.
name: <name>
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "0"
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/godatadriven/hp-fury.git
# Branch under which to find your application manifests.
targetRevision: <branch>
# Path under which to find your application manifests.
path: applications/<name>
destination:
server: https://kubernetes.default.svc
# Namespace where your resources should be deployed.
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
- Push your changes.
- Open the ArgoCD UI at
https://argocd.hp-fury.internal
and login (username and password are in the Xebia 1PW). - On the ArgoCD applications page, refresh the
applications
application. ArgoCD should find and start deploying your application. - If you run into any issues, fix the issues on your feature branch and push the changes.
- When you're done, merge your changes and change the application manifest (
cluster/applications/<name>.yml
) to deploy the application from the main branch instead of your feature branch.
To safely save secrets alongside your application manifests in Git, we currently provide two different approaches depending on your needs.
If you don't care about the exact value of a secret (for example, if the secret is used for two applications to talk to each other), you can ask kubernetes-secret-generator to generate a random secret for you. See their documentation for more details on how to do so.
If you do care about the value of a secret (for example, if you want to use the secret to login to a service), you can use sealed-secrets to encrypt your secret(s) and commit the encrypted values in Git. The sealed-secrets service in the cluster will then automatically decrypt the secrets when your application is deployed.
To encrypt a secret using sealed secrets:
- Install
kubeseal
andyq
usingbrew install kubeseal yq
. - Encrypt your secret using
kubeseal
, e.g.:
kubectl create secret generic <secret-name> --namespace <namespace> --dry-run=client -o json --from-literal=<name>=<value> --from-literal=... | kubeseal --controller-namespace sealed-secrets --controller-name sealed-secrets | yq -p json
- Add the resulting YAML output to your application resources.
Note that you can't change the namespace of the generated sealed secret. If you want to change the namespace, you'll have to re-encrypt the secrets. (This is intentional to stop other users from using your encrypted secret in their own namespace.)
To request a GPU for a container, configure the following resource limit/request on the container:
resources:
limits:
nvidia.com/gpu: 1
This will instruct the gpu-operator to inject the GPU and required libraries into your container.
If you want to expose a service via HTTP(S) (e.g. for a web UI), you need to configure an ingress for the service.
- Pick an address for your application, e.g.
my-app.hp-fury.internal
. - Add this DNS entry in Pihole by adding it in the
customDnsEntries
section inapplications/pihole/kustomization.yml
. It's probably easiest to make these changes directly on the main branch, in which case ArgoCD should pick up the changes if you refresh Pihole from the UI. - Add an ingress resource with this address in your application resources:
# See applications/argocd/resources/ingress.yml for the full example.
spec:
ingressClassName: traefik
rules:
- host: my-app.hp-fury.internal # DNS address of your application, as added in PiHole.
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app # Name of your service, should point to your service.
port:
name: http # Name of the service port, should match your service.
This will allow you to access your service under https://my-app.hp-fury.internal
in your browser.
The below steps were used to bootstrap the machine and are kept for reference.
- Install Tailscale following the official guide.
- Install openssh-server using apt-get and harden the configuration (e.g. publickey access).
- Enable UFW, allowing access for SSH.
- Add the K3s network ranges in UFW K3s ranges.
- Install K3S following the official guide.
- (Optional) Add your Tailscale DNS name to
/etc/rancher/k3s/config.yaml
and runk3s certificate rotate
to generate certificates including your Tailscale DNS name:
tls-san:
- "hp-fury.tail6720f8.ts.net"
- Create the initial ArgoCD installation using
kubectl apply -k applications/argocd
. - Use K9s to forward port 8080 from the argo-server service and open the ArgoCD UI at http://localhost:8080.
- Retrieve initial the login password from the K8s secret
argocd-initial-admin-secret
in the cluster and login usingadmin/<initial-password>
. - In the UI, go to
User Info > Update Password
and change the admin user password to something secret. - Configure any required repositories under
Settings > Repositories
. For GitHub, use HTTPS with a fine-grained access token as password. The token only needs to have read permissions on the repo content. - Deploy the applications using
kubectl apply -k cluster
.
- Run the following commands to install the drivers, as detailed here.
distro=ubuntu2204 arch=x86_64 wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-drivers cuda-toolkit nvidia-gds nvidia-container-toolkit
- Check whether the drivers where installed correctly using
nvidia-smi
. - Enable the nvidia container runtime as default for k3s by editing
/etc/rancher/k3s/config.yaml
and adding the settingdefault-runtime: "nvidia"
. - Install the gpu-operator from
applications/gpu-operator
(e.g. using ArgoCD). - Test your installation by running an example application, e.g.:
apiVersion: v1
kind: Pod
metadata:
name: gpu-operator-test
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# You either need to specify this env var or the resource limit
# request below to ensure access to the GPU.
env:
- name: NVIDIA_VISIBLE_DEVICES
value: all
image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v25.3.0
command: [/bin/sh, -c]
args: [vectorAdd]
resources:
limits:
nvidia.com/gpu: 1
# Needed if the default runtime for k3s isn't set to nvidia.
# runtimeClassName: nvidia
- Before we can install Pihole, we need to disable systemd-dns to free up port 53: https://gist.github.com/zoilomora/f7d264cefbb589f3f1b1fc2cea2c844c.
- Next, install Pihole on the cluster (
applications/pihole
) adding any local addresses undercustomDnsEntries
. - Once Pihole is running, check if you can query Pihole using dig, e.g. using
dig google.com @<ip-of-machine-running-pihole>
. - In the Tailscale admin console, open the DNS tab. Under nameservers, add the IP of the machine running Pihole and enable
Override DNS servers
.