Table of Contents
Prerequisites
For an RKE2 cluster, we need three Linux systems for the control plane. In my case, these are three Arch VMs (cp1.home, cp2.home, cp3.home) each with 2 cores and 6GB RAM, but it should work with any distribution. It is important that the VMs can resolve each other’s names. You can also start with a single-node installation, but then you lose redundancy and availability is limited during system updates. Additionally, having your own domain to access Kubernetes services is beneficial. Cloudflare is practical as a host, where I also have my test domain.
Installation and configuration of the first node
Installing the first node is more extensive because the initial cluster configuration is also performed here. First, the binaries need to be installed. This can be done via rpm or tarball. I use the latter because this method works on all Linux distributions.
# Run installation directly as root
curl -sfL https://get.rke2.io | sh -
Code language: Bash (bash)
Before the first start, we need to configure rke2. All configuration files are also available on Github in the repository https://github.com/worli-info/rke2-setup
# First create the config directory
mkdir -p /etc/rancher/rke2
Code language: Bash (bash)
Then copy the config.yaml from the repository /config/config.yaml
write-kubeconfig-mode: '0644'
tls-san:
- 'cp1'
- 'cp2'
- 'cp3'
- 'cp1.home'
- 'cp2.home'
- 'cp3.home'
debug: true
cni:
- 'multus'
- 'cilium'
disable-kube-proxy: true
embedded-registry: true
kube-apiserver-arg:
- 'audit-log-path=/var/log/rke2/audit.log'
- 'audit-log-maxage=40'
- 'audit-log-maxbackup=15'
- 'audit-log-maxsize=150'
Code language: YAML (yaml)
To have kubectl in the path and set a few aliases, I have placed my .profile and .vimrc in the repository under /user.
alias vi=vim
alias kn='kubectl config set-context --current --namespace '
alias oc=kubectl
alias k=kubectl
alias ls='ls --color '
export EDITOR=vim
export PATH=$PATH:/var/lib/rancher/rke2/bin
export KUBECONFIG=/var/lib/rancher/rke2/server/cred/admin.kubeconfig
Code language: JavaScript (javascript)
set mouse=
syntax on
set expandtab
set smarttab
set shiftwidth=2
set tabstop=2
Code language: JavaScript (javascript)
On my Arch system, I also install bash-completion:
pacman -Sy bash-completion
Now we can start rke2 for the first time:
systemctl start rke2-server
To avoid having to type the namespace all the time, we set the namespace of the current context using my alias ‘kn’ and then check which pods are running:
kn kube-system
kubectl get pods
NAME READY STATUS RESTARTS AGE
cilium-jxgxm 0/1 Init:1/7 1 (42s ago) 2m4s
cilium-operator-65b696c86-7gwg6 0/1 Running 1 (34s ago) 2m4s
cilium-operator-65b696c86-cdxqf 0/1 Pending 0 2m4s
cloud-controller-manager-cp1 1/1 Running 0 2m15s
etcd-cp1 1/1 Running 0 2m15s
helm-install-rke2-cilium-852c7 0/1 Completed 0 2m11s
helm-install-rke2-coredns-ktmqp 0/1 Completed 0 2m11s
helm-install-rke2-ingress-nginx-2xcxw 0/1 Pending 0 2m11s
helm-install-rke2-metrics-server-lwckm 0/1 Pending 0 2m11s
helm-install-rke2-multus-dlc56 0/1 Completed 0 2m11s
helm-install-rke2-runtimeclasses-sv7jx 0/1 Pending 0 2m11s
helm-install-rke2-snapshot-controller-crd-sf7mz 0/1 Pending 0 2m11s
helm-install-rke2-snapshot-controller-cszx6 0/1 Pending 0 2m11s
kube-apiserver-cp1 1/1 Running 0 2m15s
kube-controller-manager-cp1 1/1 Running 0 2m15s
kube-scheduler-cp1 1/1 Running 0 2m15s
rke2-coredns-rke2-coredns-86c455b944-7jxq7 0/1 Pending 0 2m4s
rke2-coredns-rke2-coredns-autoscaler-79677f89c4-5q4jb 0/1 Pending 0 2m4s
rke2-multus-2lgvf 0/1 CrashLoopBackOff 4 (30s ago) 2m4s
Code language: JavaScript (javascript)
We can now see that the two CNIs are not really starting. The pod for Multus is crashing and both the cilium-operator and the cilium instance for the node are not coming online. This is due to the missing cilium configuration. We can only set that after the first start. However, even if we place the cilium config now, the operator pods cannot automatically restart. Here we hit the default settings: two operator pods must run on two different Kubernetes nodes and during an update/rollout of a new version, a maximum of 50% can be down.
We will now place the rke2-cilium-config.yaml (found in the repository under manifests) into /var/lib/rancher/rke2/server/manifests and then check the status.
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
kubeProxyReplacement: true
k8sServiceHost: 'localhost'
k8sServicePort: '6443'
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
ingressController:
enabled: true
loadbalancerMode: shared
gatewayAPI:
enabled: true
updateStrategy:
rollingUpdate:
maxUnavailable: 3
bgpControlPlane:
enabled: true
Code language: YAML (yaml)
kubectl get pods
NAME READY STATUS RESTARTS AGE
cilium-d92vh 0/1 Running 0 4m50s
cilium-operator-65b696c86-7gwg6 0/1 CrashLoopBackOff 9 (65s ago) 23m
cilium-operator-6f7b9975f7-6w8x6 0/1 Pending 0 4m50s
cilium-operator-6f7b9975f7-zwlqk 0/1 Pending 0 4m50s
cloud-controller-manager-cp1 1/1 Running 0 23m
etcd-cp1 1/1 Running 0 23m
helm-install-rke2-cilium-cq8q9 0/1 Completed 0 4m53s
helm-install-rke2-coredns-ktmqp 0/1 Completed 0 23m
helm-install-rke2-ingress-nginx-2xcxw 0/1 Pending 0 23m
helm-install-rke2-metrics-server-lwckm 0/1 Pending 0 23m
helm-install-rke2-multus-dlc56 0/1 Completed 0 23m
helm-install-rke2-runtimeclasses-sv7jx 0/1 Pending 0 23m
helm-install-rke2-snapshot-controller-crd-sf7mz 0/1 Pending 0 23m
helm-install-rke2-snapshot-controller-cszx6 0/1 Pending 0 23m
hubble-relay-57f4c565fb-s94nk 0/1 Pending 0 4m50s
hubble-ui-7677cbb7c-sjxjq 0/2 Pending 0 4m50s
kube-apiserver-cp1 1/1 Running 0 23m
kube-controller-manager-cp1 1/1 Running 0 23m
kube-scheduler-cp1 1/1 Running 0 23m
rke2-coredns-rke2-coredns-86c455b944-7jxq7 0/1 Pending 0 23m
rke2-coredns-rke2-coredns-autoscaler-79677f89c4-5q4jb 0/1 Pending 0 23m
rke2-multus-2lgvf 0/1 CrashLoopBackOff 9 (2m45s ago) 23m
Code language: JavaScript (javascript)
We see the expected: The operator deployment 65b696c86 was scaled down to one pod. However, the two new pods are Pending because they are not allowed to run on the same Kubernetes node.
There are now several ways to solve this:
- Quick and Dirty: We delete the pod cilium-operator-65b696c86-7gwg6. This allows a new pod to start with the correct config and cilium should work. The downside is that with every update to the cilium config or an rke2 update, the problem recurs and the cosmetic issue of one operator pod remaining in Pending also persists.
- We continue with the cluster installation, then we will have one or two additional nodes and the problem no longer occurs. (This is what we are doing)
- With a single node installation, we don’t have the option. We are left only with changing the number of operator pods, adjusting the MaxUnavailable value (100%) during rollout, or the PodAffinity rules so that two operator pods can run on one node. I am not considering this option now. If an example is needed, please comment.
Add the second and third node to the cluster
We need to install the rke2 binaries again here and create the config directory:
curl -sfL https://get.rke2.io | sh -
mkdir -p /etc/rancher/rke2
Code language: JavaScript (javascript)
For the server join, the config file is simpler, but what we need is the generated token from the first server (cp1). This is located under /var/lib/rancher/rke2/server/node-token. The config then looks like this, of course the token and server name from the first server must be changed. An example config file can be found in the repo under config/config2.xml
server: https://cp1.home:9345
token: K10f07d244ed11093e8651214836ee213efef6090d40c64bb849cc823fd17513fcf::server:b656f35f967ec682ff4b6f683d843cb7
write-kubeconfig-mode: '0644'
tls-san:
- 'cp1'
- 'cp2'
- 'cp3'
- 'cp1.home'
- 'cp2.home'
- 'cp3.home'
cni:
- 'multus'
- 'cilium'
disable-kube-proxy: true
embedded-registry: true
kube-apiserver-arg:
- 'audit-log-path=/var/log/rke2/audit.log'
- 'audit-log-maxage=40'
- 'audit-log-maxbackup=15'
- 'audit-log-maxsize=150'
Code language: PHP (php)
Now we can start the service
systemctl start rke2-server
Code language: Bash (bash)
Repeat the same for the third server. Afterwards, the cluster and the pods should look as follows:
kubectl get pods && kubectl get nodes
NAME READY STATUS RESTARTS AGE
cilium-b9dtk 1/1 Running 0 10m
cilium-mddsn 1/1 Running 0 12m
cilium-operator-6f7b9975f7-dkljb 1/1 Running 0 12m
cilium-operator-6f7b9975f7-tkmnq 1/1 Running 0 12m
cilium-qktg4 1/1 Running 0 5m45s
cloud-controller-manager-cp1 1/1 Running 1 (15m ago) 15m
cloud-controller-manager-cp2 1/1 Running 0 9m59s
cloud-controller-manager-cp3 1/1 Running 0 5m31s
etcd-cp1 1/1 Running 0 15m
etcd-cp2 1/1 Running 0 9m59s
etcd-cp3 1/1 Running 0 5m31s
helm-install-rke2-cilium-zh752 0/1 Completed 0 12m
helm-install-rke2-coredns-fl66w 0/1 Completed 0 15m
helm-install-rke2-ingress-nginx-bmxnr 0/1 Completed 0 15m
helm-install-rke2-metrics-server-q964b 0/1 Completed 0 15m
helm-install-rke2-multus-f64vt 0/1 Completed 0 15m
helm-install-rke2-runtimeclasses-mc4g2 0/1 Completed 0 15m
helm-install-rke2-snapshot-controller-99cvx 0/1 Completed 1 15m
helm-install-rke2-snapshot-controller-crd-d8x9b 0/1 Completed 0 15m
hubble-relay-57f4c565fb-cx5nd 1/1 Running 0 12m
hubble-ui-7677cbb7c-8k8cp 2/2 Running 0 12m
kube-apiserver-cp1 1/1 Running 0 15m
kube-apiserver-cp2 1/1 Running 0 9m59s
kube-apiserver-cp3 1/1 Running 0 5m31s
kube-controller-manager-cp1 1/1 Running 0 15m
kube-controller-manager-cp2 1/1 Running 0 9m59s
kube-controller-manager-cp3 1/1 Running 0 5m31s
kube-scheduler-cp1 1/1 Running 0 15m
kube-scheduler-cp2 1/1 Running 0 9m59s
kube-scheduler-cp3 1/1 Running 0 5m31s
rke2-coredns-rke2-coredns-86c455b944-95pks 1/1 Running 0 9m28s
rke2-coredns-rke2-coredns-86c455b944-fqdph 1/1 Running 0 15m
rke2-coredns-rke2-coredns-autoscaler-79677f89c4-77n29 1/1 Running 0 15m
rke2-ingress-nginx-controller-4fdkz 1/1 Running 0 9m33s
rke2-ingress-nginx-controller-bcd9v 1/1 Running 0 9m33s
rke2-ingress-nginx-controller-jr4gq 1/1 Running 0 5m5s
rke2-metrics-server-69bdccfdd9-5wfrb 1/1 Running 0 9m42s
rke2-multus-8nkdc 1/1 Running 6 (12m ago) 15m
rke2-multus-c66lp 1/1 Running 2 (9m45s ago) 10m
rke2-multus-rg79w 1/1 Running 2 (5m17s ago) 5m45s
rke2-snapshot-controller-696989ffdd-rtdcc 1/1 Running 0 9m39s
NAME STATUS ROLES AGE VERSION
cp1 Ready control-plane,etcd,master 15m v1.32.8+rke2r1
cp2 Ready control-plane,etcd,master 10m v1.32.8+rke2r1
cp3 Ready control-plane,etcd,master 5m45s v1.32.8+rke2r1
Code language: JavaScript (javascript)
The cluster is now ready for the first applications. The next part of my guide will install argocd and cert-manager. Once it is online, it will be linked here.