This repository contains the infrastructure configuration for my Kubernetes cluster, including all the manifests and documentation.
Table of contents:
Even with all this documentation, your only possible solution to fix a deployment issue may be to ask Sylvain for help. Your best hope is probably to find him at AGEPOLY late enough in the evening so that he's not 100% busy, but also just before he starts playing babyfoot. This requires a good sense of timing.
Because nothing is ever perfect, here is a list of things that need to be done. Sorted by priority.
- setup pgbackrest for critical databases backups.
- understand how to apply values from a
common.yamlfile to severalkustomization.yamlfiles. - use more config maps instead of config PVCs.
- Pro: This namespace is for all the services hosted for me as a freelancer, such as Umami, HasteServer, Blog, DDPE, Diswho, Instaddict.
- ManageInvite: This namespace is for all the services hosted for my ManageInvite project (API, Dashboard, Discord bot).
- Home: This namespace is for all the services hosted for my personal use such as Vaultwarden, TimeTagger, Immich, Monica, Mealie, FileBrowser.
- Managed: This namespace is for all the services hosted for my clients.
- Sushiflix: This namespace is for all media services such as Plex, Radarr, Sonarr, Bazarr, Jackett, Qbittorrent, Sabnzbd, Tautulli, Overseerr.
- DB: This namespace is for all the databases such as PostgreSQL, PgAdmin4.
- Workflows: This namespace is for all the workflows such as Harbor.
- Monitoring: This namespace is for all the monitoring services such as Grafana, Prometheus, Alertgram, Promtail and Loki.
- Backups: One snapshot of each volume is taken every 30 minutes, and a backup is sent to a S3 bucket every night. The retention policy is 7 days for snapshots and 30 days for off-site backups. Movies and TV shows are not backed up, considered as non-critical data.
Create a new secret (the simplest way is to use stringData).
apiVersion: v1
kind: Secret
metadata:
name: my-app-secret
namespace: {NAMESPACE_NAME} # IMPORTANT
type: Opaque
stringData:
username: martin007
password: mOnSuper1M0t2pass
code-secret: "1234 5678 9101"Encrypt the secret.
kubeseal --scope namespace-wide --cert https://raw.githubusercontent.com/Androz2091/k8s-infrastructure/main/sealed-secrets.crt -o yaml < secrets.yaml > sealed-secrets.yamlkubectl -n somenamespace port-forward svc/someservice host-port:cluster-port`kubectl -n somenamespace exec --stdin --tty somepod -- /bin/bashhelm template my-app repo-url/app -f values.yamlSame applies for kustomization.yaml files:
kubectl kustomize --enable-helm .By default longhorn backups all volumes. Sometimes, for movies or other non-critical data, we don't want to backup the volume. In that case, you should add these labels to the volume:
labels:
recurring-job-group.longhorn.io/nobackup: enabled
recurring-job.longhorn.io/source: enabledUse port forwarding to access the Longhorn UI. Forbidden: field can not be less than previous value). Then delete all deployments using the volume. Then expand it via Longhorn UI.
From a secrets.yaml file:
kubeseal --scope namespace-wide --cert ../../../sealed-secrets.crt -o yaml < secrets.yaml > sealed-secrets.yamlRaw from a file:
kubeseal --scope namespace-wide --cert ../../../sealed-secrets.crt --raw --from-file=config.json--scope namespace-wide yet, make sure to use cluster-wide instead when using sealedFileSecrets.
kubeseal --recovery-unseal --recovery-private-key ~/private.key -o yaml < sealed-secrets.yamlThe Plex server has to be accessed locally to be claimed. Use port forwarding to access it first. Then we need to specify the custom domain name in the server network settings (advanced), and specify plex.androz2091.fr. Otherwise it will try to load data from server-ip:32400 or even cluster-ip:32400 which is not securely accessible.
Logs are collected by Promtail/Loki and can be access via the dashboard available at grafana/loki-dashboard.json.
onechart so we have to differentiate them using the instance label. For instance, please select App > pro/onechart and Instance > manage-invite-bot to view ManageInvite's logs.
apt update && sudo apt upgrade -y
apt-get install -y software-properties-common curl jqTurn off swap.
swapoff -a
systemctl mask dev-sdb?.swap && systemctl stop dev-sdb?.swap # Debian special, check dans htop`Install CRI-O and Kubernetes. See cri-o/packaging instructions..
KUBERNETES_VERSION=v1.31
CRIO_VERSION=v1.30Add the Kubernetes repository.
curl -fsSL https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/Release.key |
gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/ /" |
tee /etc/apt/sources.list.d/kubernetes.listAdd the CRI-O repository.
curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/stable:/$CRIO_VERSION/deb/Release.key |
gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://pkgs.k8s.io/addons:/cri-o:/stable:/$CRIO_VERSION/deb/ /" |
tee /etc/apt/sources.list.d/cri-o.listInstall the packages.
apt-get update
apt-get install -y cri-o kubelet kubeadm kubectl
apt-mark hold cri-o kubelet kubeadm kubectlStart the cluster
systemctl start crio.serviceForwarding IPv4 and letting iptables see bridged traffic.
cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sysctl --system
# Checks
lsmod | grep br_netfilter
lsmod | grep overlay
systemctl enable --now crioCreate the cluster.
kubeadm init --pod-network-cidr=10.244.0.0/16Configure kubectl CLI to connect to the cluster.
export KUBECONFIG=/etc/kubernetes/admin.confAllow the current (single) node to be a worker node.
kubectl taint nodes --all node-role.kubernetes.io/control-plane-kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlsudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddyStart Caddy (execute this command in the directory where the Caddyfile is located).
sudo caddy startcurl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helmhelm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm repo update
helm install sealed-secrets sealed-secrets/sealed-secrets --namespace kube-system --create-namespace --version 2.16.1Install the CLI.
# Fetch the latest sealed-secrets version using GitHub API
KUBESEAL_VERSION=$(curl -s https://api.github.com/repos/bitnami-labs/sealed-secrets/tags | jq -r '.[0].name' | cut -c 2-)
# Check if the version was fetched successfully
if [ -z "$KUBESEAL_VERSION" ]; then
echo "Failed to fetch the latest KUBESEAL_VERSION"
exit 1
fi
curl -OL "https://github.com/bitnami-labs/sealed-secrets/releases/download/v${KUBESEAL_VERSION}/kubeseal-${KUBESEAL_VERSION}-linux-amd64.tar.gz"
tar -xvzf kubeseal-${KUBESEAL_VERSION}-linux-amd64.tar.gz kubeseal
sudo install -m 755 kubeseal /usr/local/bin/kubesealCreate a public and private key.
export PRIVATEKEY="mytls.key"
export PUBLICKEY="mytls.crt"
export NAMESPACE="kube-system"
export SECRETNAME="sealed-secrets-customkeys"openssl req -x509 -days 365 -nodes -newkey rsa:4096 -keyout "$PRIVATEKEY" -out "$PUBLICKEY" -subj "/CN=sealed-secret/O=sealed-secret"Create the secret.
kubectl -n "$NAMESPACE" create secret tls "$SECRETNAME" --cert="$PUBLICKEY" --key="$PRIVATEKEY"
kubectl -n "$NAMESPACE" label secret "$SECRETNAME" sealedsecrets.bitnami.com/sealed-secrets-key=activeDelete the sealed-secrets controller pod to refresh the keys.
kubectl -n "$NAMESPACE" delete pod -l name=sealed-secrets-controllerSee bitnami-labs/sealed-secrets.
kubectl create namespace argocd
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
helm install argocd argo/argo-cd --namespace argocd --create-namespace --values https://raw.githubusercontent.com/Androz2091/k8s-infrastructure/main/argocd-values.yaml --version 7.0.0CLI de argo.
sudo curl -sSL -o /usr/local/bin/argocd https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64 sudo chmod +x /usr/local/bin/argocdGet ArgoCD password.
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echoapt-get install open-iscsi -y
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.7.0(optional) forward the Longhorn UI to the host.
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80todo /var/lib/longhorn
kubectl apply -f https://raw.githubusercontent.com/Androz2091/k8s-infrastructure/main/bootstrap-app.yamlDisable systemd-resolved if it's running.
sudo systemctl disable systemd-resolved.service
sudo systemctl stop systemd-resolved.service
mv /etc/resolv.conf /etc/resolv.conf.bak- update
/etc/resolv.confas follows:
nameserver 8.8.8.8
nameserver 10.96.0.10Now we also need to update the cluster DNS so it does not loop back to the host.
- dump the current CoreDNS config:
kubectl -n kube-system get configmap coredns -o yaml > coredns_patched_dns.yaml- edit the
coredns_patched_dns.yamlfile and add the following line to theCorefile:
forward . 1.1.1.1 8.8.8.8 {
max_concurrent 1000
}
- then apply the changes by running:
kubectl apply -f coredns_patched_dns.yamlDo not forget the \ before the $ in the username.
kubectl -n managed create secret docker-registry regcred --docker-server=harbor.androz2091.fr --docker-username="robot\$name" --docker-password="secret_token"From the Longhorn UI, go to Settings > Backup Target and add a new target with the following settings:
s3://<bucket_name>@s3.us-west-004.backblazeb2.com/<path>Create a new secret with the credentials.
kubectl create secret generic s3-secret --from-literal=AWS_ACCESS_KEY_ID=<access_key> --from-literal=AWS_SECRET_ACCESS_KEY=<secret_key> --from-literal=AWS_ENDPOINTS=s3.us-west-004.backblazeb2.com -n longhorn-systemA snaphost is the state of a Kubernetes Volume at any given point in time. It's stored in the cluster. A backup is a snapshot that is stored outside of the cluster. It's stored in the backup target (here backblaze).
This usually happens when the internal Kubernetes API server's certificate is about to expire. You can check the expiration date with the following command: sudo kubeadm certs check-expiration.
If it's about to expire, run sudo kubeadm certs renew all. Then use sudo systemctl restart kubelet to restart the pods!
Understand why sometimes requests are terminated by a SSL error (1/5 requests for some services):
poca@localhost:~ (1) $ curl https://tautulli.androz2091.fr
curl: (35) OpenSSL/1.1.1l-fips: error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal errorDouble check the Caddyfile and make sure that all the DNS are configured to the correct IP (sometimes when a SSL certificate fails to create/renew, such errors can occur for all domains).
Also... double check that Caddy is not started twice (see https://serverfault.com/questions/1167816/openssl-routinesssl3-read-bytestlsv1-alert-internal-error-with-kubernetes-and/1168625#1168625).
When running kubectl logs some-pod I was getting failed to create fsnotify watcher: too many open files. The issue was solved by increasing the number of inotify max user instances. (see https://serverfault.com/questions/984066/too-many-open-files-centos7-already-tried-setting-higher-limits).
debian@ns561436:~$ cat /proc/sys/fs/inotify/max_user_watches
524288
debian@ns561436:~$ cat /proc/sys/fs/inotify/max_user_instances
128
sudo bash -c 'cat <<EOF> /etc/sysctl.d/fs_inotify.conf
fs.inotify.max_user_instances = 1024
fs.inotify.max_user_watches = 1048576
EOF'