This is an experiment to create an automated install of a 3 node Kubernetes cluster - 1 master and 2 worker nodes. The scripts are subject to further improvement, as there are hardcoded values where I should use variables anc cycles and flexible configurations. I will improve, I promise.
- Virtual machines using Incus and quemu/libvirt as virtualization layer
- Terraform to automate provision of nodes
- Ansible to configure nodes further
terraform - terraform code for node provision
ansible - Ansible scripts, config and inventory
misc - helpet script to set up NAT under WSL environement
- Incus installed - with quemu/libvirt backing
- Terraform installed
- Ansible installed
You can set some params of the VM creation in the terraform.tfvars file.
We choose an Incus image of Ubuntu 22.04, a non cloud-init version. The installation process uses Debian style commands and package names, so do not set the base image to a RedHat style one. It won't work without modifying the VM provisioning commands and also the kubernetes related software installation part in the Ansible code.
You can set the desired target network in CIDR notation. We will create an Incus bridge using the first address of that nework range. The nodes will get theyr IP addresses iterated from the second address.
You can set the desired number of master and worker nodes. They will appear as master1, master2, ... and worker1, worker2, ...
In config.yml you can select the desired software versions.
- kubernetes_version: used in form v1.33 - will set up the kubernetes repository for that version an installs the latest kubelet, kubeadn and kubectl.
- containerd_version: full version string to be used. We will install the exact vesion selected. Please refer containerd install guide to select appropriate version for the selected kubernetes version.
- in terraform directory run
terraform planthenterraform apply. On successful completion created the 3 nodes as Ubuntu VM-s, all with- static IP configured
- user
adminwith passwordkubepasscreated - sshd installed and started
- sudo for
adminuser enabled
Terraform generates an Ansible inventory, which is configured as the default inventory in ansible.cfg. It will list all the created nodes and theyr IP addresses as ansible_ssh_host variable. There will be two separate groups for masters and workers.
-
in ansible directory run
ansible-playbook preparenodes.yml. This does the following- enable pubkey auth from the host to the nodes as 'admin' and as 'root'
- creates a keypair and distributes to the nodes to communicate with each other as admin or root
- populates /etc/hosts files on nodes
- populates known_hosts files for admin and root on all nodes
- enable passwordless sudo for admin on all nodes
- prepare kernel config for Kubernetes - this is broken yet
- install containerd
- install (and set hold) Kubernetes packages: kubelet, kubeadm and kubectl
Warning: Ansible is configured in a highly insecure way: plain text password saved in the project directory in the file very_insecure_password_file. Also ansible.cfg is configured to read connection password and become password from this file when passwordless pubkey auth is not configured yet towards the nodes.
Warning In this environement every node has only one IP address. If you create an env where there are multiple addresses configured, you have to make sure
kubeletis using one that is able to communivate with other nodes. For example VirtualBox Nat adapters can be used to reach the internet, but no communication between the nodes. Then you must explicitly tell which address is to be used by kubelet. You can set this in/etc/default/kubeletfile, as follows:KUBELET_EXTRA_ARGS='--node-ip 111.222.33.44'Obviously edit content for your needs. -
Log on
masterhost viaincus shell masterorssh [email protected]and set up Kubernetes cluster- Run
kubeadm initwith the applicable parameters to initialise the control-plane--pod-network-cidr=10.244.0.0/16Obligatory parameter because without it pod network can not start, butkubeadmwon't warn you about that.--control-plane-endpoint=xxx.xxx.xxx.xxx:6443When creating multi master k8s cluster, you need a common endpoint to be used. Here we create an external load balancer host balancer1, which is configured to send requests to all master nodes to the port6443. Use the IP address of the balancer node. For single master, it can be omitted.--upload-certsTo creare multi-master cluster, it is advisable to store the generated certs in the etcd. Otherwise you need to copy the manually to the additional master nodes when they join the cluster. When successfully initialised, kubeadm will give a command with tokens needed to connect master and worker nodes. Save the commands, as the join tokens can not be displayed again.- for adding master nodes:
kubeadm join 192.168.211.9:6443 --token ruz8ac.p9u0h82cu1xnn6ho --discovery-token-ca-cert-hash sha256:66e...b6e72d --control-plane --certificate-key 7db87a17ff2e45d6...f69a18171a - for adding worker nodes:
kubeadm join 192.168.101.10:6443 --token clm3xc.exhryqyu8huronp6 --discovery-token-ca-cert-hash sha256:9b91013e81a...5a7e3ea06
- Set up kubectl konfig. You can use the default config for
/etc, exportingKUBECONFIGenvironement variable:export KUBECONFIG=/etc/kubernetes/admin.confor copy this config to your own home directory into~/.kube/configfile. - Choose a CNI plugin, like Flannel or Calico and install it. For example you can install Flannel CNI plugin with
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml - Check if control plane status is
ready- give it a minute or soroot@master:~# kubectl get node NAME STATUS ROLES AGE VERSION master Ready control-plane 74m v1.34.1 - Add worker nodes: log in as root and run the
kubeadm join ...command, which you saved earlier. After a minute you should see all worker nodes inreadystatus
- Run
We switched to use VM-s in place of system containers, because running Kubernetes K8s in LXC system containers proved to be too close to impossible. Now ater running terraform and ansible, you have the master node ready to run kubeadm init and and worker nodes ready to join. On WSL you might need the extra step to enable NAT on the bridge to let nodes access the internet. Use the misc/incus-nat-setup-for-WSL.sh script. This script relies on the existence of kube_br0 interface, so you can not run it before terraform creates the interface, but terraform will fail to configure the nodes without accessing internet - needed to install sshd. Second run of terraform might help.
Check if libvirtd is available:
systemctl status libvirtd
If it is not present, install quemu / libvirt to Ubuntu host
sudo apt update
sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager
sudo systemctl enable --now libvirtd
The commands below create a test host in the way terraform should. This was necessary because of a struggle setting up hosts with static IP addresses without messing up name resolution.
Create an Incus network
incus network create kube_br0 ipv4.address=192.168.101.1/24 ipv4.nat=true ipv6.address=none
Create a storage pool to be used by vm-s
incus storage create kubepool dir
Set up Incus profile "kubelab", adding a default eth0 network interface, connecting into kube_br0 network and a root disk from kubepool storage pool
incus profile create kubelab
incus profile device add kubelab eth0 nic network=kube_br0 name=eth0
incus profile device add kubelab root disk path=/ pool=kubepool
Launch test image as a virtual machine to see if network is working
incus launch images:ubuntu/22.04 testhost --profile kubelab -vm
This way the host gets IP by DHCP and network is fully functional
now tear down instance
incus delete testhost --force
Recreate insance and config it to use static IP (192.168.101.15, selected from the range of kube_br0 ridge adapter)
incus create images:ubuntu/22.04 testhost --profile kubelab
incus config device override testhost eth0 ipv4.address=192.168.101.15
-
WARNING in WSL you might not have NAT passthrough enabled. To circumvent this, run
misc/incus-nat-setup-for-WSL.shwhich will enable NAT temporarily to thekube_br0interface. Catch: this is created during terraform. -
install quemu, libvirt, and incus (incus is better to be installed from zabbly repo)
sudo apt update
sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils python3 incus gnupg software-properties-common curl -y
- install terraform from Hashicorp repo
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update
sudo apt install -y terraform
- install incus from the Zabby repo
curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc
echo "deb [signed-by=/etc/apt/keyrings/zabbly.asc] https://pkgs.zabbly.com/incus/stable $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/zabbix.list
sudo apt update
sudo apt install -y incus
-
add local user to
incusandincus-admingroups (my local username is 'trifo')sudo usermod -aG incus,incus-admin trifo -
logoff and logon to apply changes user settings
-
Initialise Incus environement. run
incus admin initTODO directions about what to set during init -
install ansible into a Python virtualenv
python3 -m venv ~/ansible-venv # create python virtual env
. ~/ansible-venv/bin/activate # sourcing activate script
pip install ansible # do the install
ansible --version # check install
- Remove installed stuff and clean up
When testing is done, all data and installed software can be eliminated. First destroy virtual infrastructure:
in terraform directory run
terraform destroy -auto-approve. This deletes all relevant virtual machines and config they relied on. If you want to clean installed software as vell, then:
sudo apt remove --purge qemu-system-x86 qemu-kvm libvirt-clients libvirt-daemon-system bridge-utils terraform incus -y
sudo apt autoremove -y
sudo rm /etc/apt/sources.list.d/zabbly.list
sudo rm /etc/apt/sources.list.d/hashicorp.list
sudo apt clean # Is this enough to get rid of all cached elements?
-
set FQDN hostnames for nodes. (
master->master.kubernetes.local) -
set Incus image as a variable
-
set Incus VM resources (mem/cpu) as variables - they are hardcoded now
-
create TF output summary (created VM params, ansible command to run, ...)
-
VM creation gets stuck when not /24 network is choosen
-
implement config.yml in Ansible part