Skip to content

godatadriven/hp-fury-setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HP-Fury

GitOps repository for the HP-Fury machine.

Getting started

To get network access to the machine:

  • Login to tailscale.com using your Xebia e-mail address and install the Tailscale VPN client. Ping Julian to have your Tailscale account approved.
  • Once your account is approved, connect to the Tailscale VPN and check that you can reach the machine: ping hp-fury.tail6720f8.ts.net.

To setup access for Kubernetes:

  • Install kubectl on your machine: brew install kubernetes-cli.
  • Optional, but highly recommended:
    • Install K9s for a interactive CLI interface for K8s: brew install k9s.
    • Install kubectx to easily switch between K8s contexts and namespaces: brew install kubectx.
    • Install kustomize to manually deploy applications to the cluster using Helm through kustomize
      • Install helm to use kustomize to build helm charts
  • Look for de HP-Fury - kube-config secret in the Xebia Data 1Password and copy the kube-config.yaml file to ~/.kube/config on your machine (or merge it with your existing config if you already have one).
  • Verify that you can connect to the cluster by running kubectl cluster-info. This should return details from the cluster at hp-fury.tail6720f8.ts.net. If you see another cluster, check you have the right context selected for kubectl.

If you need SSH access, please ping Julian to setup an account etc. on the machine.

Guidelines

When developing on this machine, please adhere to the following guidelines:

  1. Don't use the default namespace.
  2. Create separate namespace(s) for your application(s). This can either be:
  • A personal namespace in your name (e.g. jrderuiter).
  • A namespace named after the application (e.g. argocd).
  • A namespace grouping multiple applications (e.g. monitoring).
  1. Don't put any super sensitive data on the machine.
  2. Don't store any secrets in Git. (Use sealed-secrets or the secret generator instead, as outlined below).
  3. Ping our Slack channel #hp-zp-g5 whenever you plan to use significant resources (such as the GPUs). This to both check whether they're available and to notify others.
  4. In general: be mindful of others!

Usage

Adding Kubernetes resources

To add an application or other Kubernetes resources:

  1. Create a branch on this repo (e.g. feature/add-langfuse).
  2. Create a folder under applications (e.g. applications/<name>) for your resources.

If you're using kustomize (recommended), please use the following folder structure:

  • patches (directory for kustomize patches, if needed)
  • resources (directory for your manifest files)
    • namespace.yml
    • ...
  • kustomization.yml

For Helm-based applications, you can use Helm within Kustomize (see applications/sealed-secrets for an example).

Deploying Kubernetes resources manually

During development, you can deploy your Kubernetes resources manually using kubectl, e.g.:

# For plain Kustomize
kubectl apply -k applications/<name>

or

# For Kustomize with Helm
kustomize build --enable-helm applications/<name> | kubectl apply -f -

Deploying Kubernetes resources with ArgoCD

For longer running applications, you can deploy your resources via ArgoCD. The main advantage of this approach is that ArgoCD will monitor your application and ensure that it remains in sync with the manifests defined in this repository. Additionally, it will also monitor the application health and warn if there are any problems.

To deploy your application via ArgoCD:

  1. On your feature branch, define your application resources under applications/<name> as outlined above.
  2. On the main branch, create an application manifest for your application under cluster/applications, replacing <name>, <namespace> and <branch> with the correct values:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  # Name of your application.
  name: <name>
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "0"
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/godatadriven/hp-fury.git
    # Branch under which to find your application manifests.
    targetRevision: <branch>
    # Path under which to find your application manifests.
    path: applications/<name>
  destination:
    server: https://kubernetes.default.svc
    # Namespace where your resources should be deployed.
    namespace: <namespace>
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
  1. Push your changes.
  2. Open the ArgoCD UI at https://argocd.hp-fury.internal and login (username and password are in the Xebia 1PW).
  3. On the ArgoCD applications page, refresh the applications application. ArgoCD should find and start deploying your application.
  4. If you run into any issues, fix the issues on your feature branch and push the changes.
  5. When you're done, merge your changes and change the application manifest (cluster/applications/<name>.yml) to deploy the application from the main branch instead of your feature branch.

Managing secrets

To safely save secrets alongside your application manifests in Git, we currently provide two different approaches depending on your needs.

Generate the secret using kubernetes-secret-generator

If you don't care about the exact value of a secret (for example, if the secret is used for two applications to talk to each other), you can ask kubernetes-secret-generator to generate a random secret for you. See their documentation for more details on how to do so.

Encrypt the secret using sealed-secrets

If you do care about the value of a secret (for example, if you want to use the secret to login to a service), you can use sealed-secrets to encrypt your secret(s) and commit the encrypted values in Git. The sealed-secrets service in the cluster will then automatically decrypt the secrets when your application is deployed.

To encrypt a secret using sealed secrets:

  1. Install kubeseal and yq using brew install kubeseal yq.
  2. Encrypt your secret using kubeseal, e.g.:
kubectl create secret generic <secret-name> --namespace <namespace> --dry-run=client -o json --from-literal=<name>=<value> --from-literal=... | kubeseal --controller-namespace sealed-secrets --controller-name sealed-secrets | yq -p json
  1. Add the resulting YAML output to your application resources.

Note that you can't change the namespace of the generated sealed secret. If you want to change the namespace, you'll have to re-encrypt the secrets. (This is intentional to stop other users from using your encrypted secret in their own namespace.)

Using the GPUs

To request a GPU for a container, configure the following resource limit/request on the container:

      resources:
        limits:
          nvidia.com/gpu: 1

This will instruct the gpu-operator to inject the GPU and required libraries into your container.

Configuring ingress for service(s)

If you want to expose a service via HTTP(S) (e.g. for a web UI), you need to configure an ingress for the service.

  1. Pick an address for your application, e.g. my-app.hp-fury.internal.
  2. Add this DNS entry in Pihole by adding it in the customDnsEntries section in applications/pihole/kustomization.yml. It's probably easiest to make these changes directly on the main branch, in which case ArgoCD should pick up the changes if you refresh Pihole from the UI.
  3. Add an ingress resource with this address in your application resources:
# See applications/argocd/resources/ingress.yml for the full example.
spec:
  ingressClassName: traefik
  rules:
    - host: my-app.hp-fury.internal  # DNS address of your application, as added in PiHole.
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app         # Name of your service, should point to your service.
                port:
                  name: http         # Name of the service port, should match your service.

This will allow you to access your service under https://my-app.hp-fury.internal in your browser.

Initial installation of the machine (for reference)

The below steps were used to bootstrap the machine and are kept for reference.

1. Networking

  1. Install Tailscale following the official guide.
  2. Install openssh-server using apt-get and harden the configuration (e.g. publickey access).
  3. Enable UFW, allowing access for SSH.

2. Installing K3s

  1. Add the K3s network ranges in UFW K3s ranges.
  2. Install K3S following the official guide.
  3. (Optional) Add your Tailscale DNS name to /etc/rancher/k3s/config.yaml and run k3s certificate rotate to generate certificates including your Tailscale DNS name:
tls-san:
  - "hp-fury.tail6720f8.ts.net"

3. Configuring ArgoCD

  1. Create the initial ArgoCD installation using kubectl apply -k applications/argocd.
  2. Use K9s to forward port 8080 from the argo-server service and open the ArgoCD UI at http://localhost:8080.
  3. Retrieve initial the login password from the K8s secret argocd-initial-admin-secret in the cluster and login using admin/<initial-password>.
  4. In the UI, go to User Info > Update Password and change the admin user password to something secret.
  5. Configure any required repositories under Settings > Repositories. For GitHub, use HTTPS with a fine-grained access token as password. The token only needs to have read permissions on the repo content.
  6. Deploy the applications using kubectl apply -k cluster.

4. Configuring the GPU drivers

  1. Run the following commands to install the drivers, as detailed here.
distro=ubuntu2204 arch=x86_64 wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-drivers cuda-toolkit nvidia-gds nvidia-container-toolkit
  1. Check whether the drivers where installed correctly using nvidia-smi.
  2. Enable the nvidia container runtime as default for k3s by editing /etc/rancher/k3s/config.yaml and adding the setting default-runtime: "nvidia".
  3. Install the gpu-operator from applications/gpu-operator (e.g. using ArgoCD).
  4. Test your installation by running an example application, e.g.:
apiVersion: v1
kind: Pod
metadata:
  name: gpu-operator-test
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      # You either need to specify this env var or the resource limit
      # request below to ensure access to the GPU.
      env:
        - name: NVIDIA_VISIBLE_DEVICES
          value:  all
      image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v25.3.0
      command: [/bin/sh, -c]
      args: [vectorAdd]
      resources:
        limits:
          nvidia.com/gpu: 1
  # Needed if the default runtime for k3s isn't set to nvidia.
  # runtimeClassName: nvidia

5. Setting up DNS via Pihole

  1. Before we can install Pihole, we need to disable systemd-dns to free up port 53: https://gist.github.com/zoilomora/f7d264cefbb589f3f1b1fc2cea2c844c.
  2. Next, install Pihole on the cluster (applications/pihole) adding any local addresses under customDnsEntries.
  3. Once Pihole is running, check if you can query Pihole using dig, e.g. using dig google.com @<ip-of-machine-running-pihole>.
  4. In the Tailscale admin console, open the DNS tab. Under nameservers, add the IP of the machine running Pihole and enable Override DNS servers.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5