Production ready Kubernetes cluster with K3s

Due to code formatting issues on this blog you can get this entire test-case from the same repo on my GitHub Profile

The links bellow are to be used in conjunction with this guide:

Preface

The following guide is a simple yet effective approach to build an on-premises Kubernetes cluster using the SUSE Rancher's K3s Kubernetes engine. It is a simple yet effective solution for many scenarios and it will work great even for on-premises production.

This guide takes into consideration the following scenario.

Only one master node.
Considering the fact that most on-premise datacenters use some virtualization solution like VMware vSphere, KVM with Proxmox etc, all of those already have some backup in place. Therefore the VMs will be backed up already.
At least three worker nodes
Three are the recommended minimum for the Kubernetes storage module - Longhorn.

Finally don't forget to visit the Official Documentation for more configuration or architectural design options

Minimum Recommended Requirements

The requirements greatly depend on the project in question and the application performance expectations. This cluster has even been tested to be working in a home lab on a stronger desktop computer with a VirtualBox.

However for small to medium teams working on a small start-up project I could only assume the following would be the bare minimum:

VM Requirements

Operating System: This has been tested on Ubuntu22.04

For more serious setup each VM in use should have the bellow bare minimums:

CPU - 4x Virtual
RAM - 8GB per VM (probably 16GB for the master node)

Storage

Disk 40GB for the system OS volume
Disk 100GB or more provisioned as LVM volume and mounted in /longhorn

Network

Network speed - Project specific
A reserved network pool of Routable IP address for the MetalLB where the DHCP will not assign any address. In this example the VMs of the cluster are in the 172.16.0.0/24 network. I decided to take a chunk of IPs by dividing it into four subnets and I will be using the last subnet /26 subnet for the MetalLB i.e 172.16.0.192/26. This gives me 62 usable IP addresses for use by MetalLB for any future deployments.

Underlying hardware/fabric requirements

As this is planned for on-premise scenarios it is highly recommended for the underlying hypervisor to have snapshot capabilities and an external backup for each VM respectively.

VERY IMPORTANT

It is best that all VMs in the cluster are backed up externally using a Hypervisor VM backup solution. Example would be Veeam Backup or VMware vSphere clusters.
The master node VM must always have daily backups. This is where the ETCD database which contains the cluster configuration is located. The ETCD service on the master nodes does it's own daily snapshots at /var/lib/rancher/k3s/server/db/snapshots/
The worker nodes does not have to be backed up, however if the application in question uses the longhorn storage, this should have its own backup (application specific). The Longhorn storage module itself has backup options which are out of the scope of this guide.

Finally, once you provision the K3s master node the node token (K3S_TOKEN), the etcd db snapshots mentioned above, and the kube config k3s.yaml file are to be considered the holly trinity of the kubernetes cluster and should be securely backed up, and kept at a save place, until you no longer need the cluster.

Lose these and you lose the cluster. You've been warned!

Node details

All nodes should have their IP and DNS hostnames configured in the local DNS for proper name resolution:

k3s-master

k3s-master.tomspirit.me                     IN      A           172.16.0.50
k3s.tomspirit.me                            IN      CNAME       k3s-master.tomspirit.me  ## This is the k3s cluster URL
k3s-prometheus.tomspirit.me             IN      CNAME       k3s-master.tomspirit.me
k3s-alertmanager.tomspirit.me           IN      CNAME       k3s-master.tomspirit.me
k3s-grafana.tomspirit.me                IN      CNAME       k3s-master.tomspirit.me

k3s-worker01

k3s-worker01.tomspirit.me     IN      A           172.16.0.51

k3s-worker02

k3s-worker02.tomspirit.me     IN      A           172.16.0.52

k3s-worker03

k3s-worker03.tomspirit.me     IN      A           172.16.0.53

General node preparation

This should be done on all nodes:

$ sudo ufw disable

# Used by the local-path-provisioner that comes with the cluster by default
$ sudo mkdir /local-path-provisioner

# Set the hostnames on all servers respectively
$ sudo hostnamectl set-hostname SERVER-FQDN

$ sudo apt update && sudo apt upgrade -y
$ sudo apt install -y open-iscsi nfs-common jq vim htop # Longhorn requirements and misc things

Provision the `k3s` master

By default k3s deployes traefic and metrics-server as part of the deployment. These will be disabled because nginx-ingress and metallb will be used instead.

$ curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.24.12+k3s1 sh -s - server \
--cluster-init \
--default-local-storage-path /local-path-provisioner \
--node-taint CriticalAddonsOnly=true:NoExecute \
--node-taint CriticalAddonsOnly=true:NoSchedule \
--tls-san 172.16.0.50 \
--tls-san k3s.tomspirit.me \
--tls-san k3s-master.tomspirit.me \
--disable traefik,metrics-server

# Get the node-token and set the kubeconfig to be readable for everyone
$ sudo cat /var/lib/rancher/k3s/server/node-token 
## Take note of the node token

$ sudo chmod 644 /etc/rancher/k3s/k3s.yaml

Get the node-token which you will need for adding the worker nodes at the Add worker nodes section bellow:

# Get the node-token and set the kubeconfig to be readable for everyone
$ sudo cat /var/lib/rancher/k3s/server/node-token  ## Take note of the node token and use it at the $K3S_TOKEN var bellow

$ sudo chmod 644 /etc/rancher/k3s/k3s.yaml

Provision the `k3s` worker nodes

The worker nodes will need some additional preparation mostly for the storage part.

K3s storage preparation

The cluster by default uses the so-called local-path-provisioner with its location at /local-path-provisioner. This is a good storage for testing. For more serious use-cases there will be a Longhorn kubernetes storage module installed bellow.

For this reason additional storage partitions should be prepared as lvm ext4 and mounted on /longhorn. The VMs in use for this had an additional virtual disk shown as /dev/sdb on each VM. In short, the additional storage partition has been provisioned using the following commands:

## Create the PV (Physical Volume)
$ pvcreate /dev/sdb && pvdisplay /dev/sdb

## Create the VG (Volume Group)
$ vgcreate longhorn-vg /dev/sdb && vgdisplay longhorn-vg

## Create he LV (Logical Volume)
##   The `--extents 38399`` value has been obtained from the PE value from the `vgdisplay` command
$ lvcreate --extents 38399 -n longhorn-lv longhorn-vg && lvdisplay /dev/longhorn-vg/longhorn-lv

## Format and mount the lvm partition as ext4 file system
$ mkfs.ext4 /dev/longhorn-vg/longhorn-lv
$ mkdir /longhorn && mount /dev/longhorn-vg/longhorn-lv /longhorn && df -Th

## Update the /etc/fstab file
$ cat /etc/fstab >fstab.20230809.bak && echo '/dev/mapper/longhorn--vg-longhorn--lv /longhorn ext4 defaults 0 0' | tee -a /etc/fstab && echo -n "\n\n" && cat /etc/fstab

## REBOOT the system at the end to make sure everythingn is working normally and
##  confirm the partition is mounted using `df -h` or `lsblk` commands
$ systemctl reboot
$ lsblk
#...
#... output omited ...
#...
sdb                           8:16   0   150G  0 disk
└─longhorn--vg-longhorn--lv 253:0    0   150G  0 lvm  /longhorn

Add worker nodes to the cluster

Execute the bellow command on each of the worker nodes, but make sure to update the K3S_TOKEN variable which can be obtained from the master node at the previous step:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.24.12+k3s1 \
K3S_URL=https://k3s.tomspirit.me:6443 \
K3S_TOKEN=VERY_LARGE_NODE_TOKEN_HERE \
sh -

Verify the cluster

The bellow commands should return positive status, a cluster with 4 nodes in total:

$ kubectl get nodes
$ kubectl get pods --all-namespaces
$ kubectl cluster-info

Install additional packages

Additional packages are important for the functionality of the whole cluster. This cluster will feature the following modules added:

cert-manager - Provide SSL/TLS Certificate functionality for other services;
ingress-nginx - Provides L7 Ingress load balancing;
metallb - provides L4 load balancing;
prometheus-monitoring - Adds Prometheus, Grafana and Alert Manager monitoring components
Longhorn - Adds Kubernetes storage capabilities

`cert-manager`

For more info visit:

cert-manager documentation
cert-manager helm repo

$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.crds.yaml

$ helm repo add jetstack https://charts.jetstack.io
$ helm repo update

$ helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.12.0

Once the cert-manager is installed create a default self signed ClusterIssuer for the whole cluster. This gives the capability to issue self signed certificates a simple TLS capability:

$ cat >cert-manager-ClusterIssuer.yml <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-cluster-issuer
spec:
  selfSigned: {}
EOF

$ kubectl apply -f cert-manager-ClusterIssuer.yml

`ingress-nginx`

For more info visit:

ingress-nginx helm repo

# nginx ingress 
# https://kubernetes.github.io/ingress-nginx/deploy/
$ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx 
$ helm repo update
$ helm search repo ingress-nginx --versions

$ cat >values-ingress-nginx.yml <<EOF
controller:
  ingressClass: nginx
  ingressClassResource:
    default: 'true'
  
  service:
    type: LoadBalancer

  admissionWebhooks:
    certManager:
      enable: 'true'

  metrics:
    enabled: true
    prometheusRule:
      enable: 'true'

  config:
    allow-snippet-annotations: 'true'
    ssl-redirect: 'false'
    hsts: 'false'
EOF

$ helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  --version 4.7.1 \
  --values values-ingress-nginx.yml

NOTE:
The cert-manager and ingress-nginx test at the additional section bellow is actually done automatically done during the prometheus-monitoring section, as it creates a self-signed certificates for the prometheus stack and implements them at the ingress-nginx. It can still be used however to test an internal PKI implementation for a separate namespace.

Additional section:

Test ingress-nginx and cert-manager self signed certificates

Prometheus monitoring

For more info visit:

prometheus-community helm repo
prometheus-community github repo

As per my experience this is the most boring and complex part of the setup, mainly because the prometheus stack is actuall 3 or more products (based on your settings) bundled into one helm chart.

First create the monitoring namespace and the TLS Certificate which will be used by the prometheus stack:

$ cat >monitoring-namespace.yml <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
EOF

$ kubectl apply -f monitoring-namespace.yml

$ cat >prometheus-stack-cert.yml <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: prometheus-stack-certificate
  namespace: monitoring
spec:
  secretName: prometheus-stack-certificate-rsa-secret
  
  duration: 87600h #10y
  renewBefore: 3600h

  subject:
    organizations:
      - tomspirit.me
  
  isCA: False

  privateKey:
    #algorithm: ECDSA
    #size: 256

    algorithm: RSA
    encoding: PKCS1
    size: 2048

  usages:
    - server auth
    - client auth

  dnsNames:
    - k3s-prometheus.tomspirit.me
    - k3s-alertmanager.tomspirit.me
    - k3s-grafana.tomspirit.me

  issuerRef:
    name: selfsigned-cluster-issuer
    kind: ClusterIssuer
    group: cert-manager.io
EOF

$ kubectl apply -f prometheus-stack-cert.yml

To install the prometheus-stack you will need to properly configure the .values file. If you are to lazy to do that yourself I can understand that compleately, and to counter that I also have this whole project on my GitHub profile so you can take the values-kube-prometheus-stack.yml from there.

Install the prometheus-stack using helm:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 
$ helm repo update 

$ helm install kube-prometheus-stack \
  -f values-kube-prometheus-stack.yml \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --version 46.8.0

To test the implementation once all of the pods have been successfully deployed browse to: https://k3s-grafana.tomspirit.me and login with the admin username and password used from the adminPassword value from the values-kube-prometheus-stack.yml file.

Longhorn storage install

For more info visit:

Longhorn docs

$ curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.4.1/scripts/environment_check.sh | bash 

$ helm repo add longhorn https://charts.longhorn.io 
$ helm repo update
$ helm install longhorn longhorn/longhorn \
  --namespace longhorn-system --create-namespace \
  --version 1.4.2

Longhorn Configuration

The configuration includes adjusting some settings from the UI as well as configuring additional storage classes to accommodate particular usage scenarios.

The default class longhorn can be used too, but it shouldn't be altered, as per their documentation.

Accessing the Longhorn UI

To access the longhorn UI it is best if you use Lens and create a port forward on the longhorn frontend service.

From the Lens UI select Network -> Services.
From the top right Namespace drop down menu select longhorn-system.
From the list of services locate the longhorn-frontend service and click on it.
On the right pannel that opens at the Connection section there is the Ports sub-section indicating 80:http/TCP. Click on the Forward button to enable port forward to the interface.
Once you are finished working, click Stop/Remove on the same button, or delete the port forwarding object from the Network -> Port Forwarding section.

Initial configuration

From the Longhorn UI the system has been configured with /longhorn as its main storage device. This location is mounted on a separate LVM volume from within the system.

The default storage device /var/lib/longhorn has been disabled as this resides on the /(root) location of the system.

For each of the nodes at the Node section of the UI the following disk configuration has been set:

Default disk (usually named) default-disk-fd0100000, with path /var/lib/longhorn, set scheduling to DISABLED, added label do-not-use, for future reference.
Added the additional LVM partition which was prepared earlier as Longhorn disk-1,with path /longhorn. Scheduling set to ENABLED, added label LVM and 25G storage reserved.

Best practices

The following is just a section from the Longhorn Best Practices from their official documentation.

Replica Node Level Soft Anti-Affinity: FALSE
Allow Volume Creation with Degraded Availability: FALSE

Creating additional storage classes

More information of the settings applied can be found at:

$ cat >longhorn-storage-classes.yml <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-best-effort-reclaim-delete
  annotations:
    storageclass.kubernetes.io/is-default-class: 'false'

provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

parameters:
  dataLocality: best-effort
  fromBackup: ''
  fsType: ext4
  numberOfReplicas: '3'
  staleReplicaTimeout: '2880'

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-best-effort-reclaim-retain
  annotations:
    storageclass.kubernetes.io/is-default-class: 'false'

provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate

parameters:
  dataLocality: best-effort
  fromBackup: ''
  fsType: ext4
  numberOfReplicas: '3'
  staleReplicaTimeout: '2880'
EOF

$ kubectl apply -f longhorn-storage-classes.yml

Additional section:

Test Longhorn volume provisioning

MetalLB

For more info visit:

Installation reference

Label the node and generate the metallb values file:

$ kubectl label node k3s-master.tomspirit.me metallb-controller=true

$ cat >values-metallb.yml <<EOF
loadBalancerClass: "metallb"

controller:
  nodeSelector:
    metallb-controller: "true"

  tolerations:
    - key: CriticalAddonsOnly
      operator: Exists
      effect: NoExecute
    - key: CriticalAddonsOnly
      operator: Exists
      effect: NoSchedule

speaker:
  frr:
    enabled: false
EOF

Deploy metallb using their helm chart:

$ helm repo add metallb https://metallb.github.io/metallb
$ helm repo update
$ helm install metallb metallb/metallb --namespace metallb-system --create-namespace -f values-metallb.yml --version 0.13.10

Configure the IP address pool and the L2 advertisement | Reference

$ cat >IPAddressPool.yml <<EOF
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: dev-pool
  namespace: metallb-system
spec:
  addresses:
  - 172.16.0.192/26
  avoidBuggyIPs: false
  autoAssign: true

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default-l2-advertisement
  namespace: metallb-system
EOF

$ kubectl apply -f IPAddressPool.yml

Additional section:

Test MetalLB deployment

Production ready Kubernetes cluster with K3s

Related articles

Preface

Minimum Recommended Requirements

VM Requirements

Underlying hardware/fabric requirements

VERY IMPORTANT

Lose these and you lose the cluster. You've been warned!

Node details

k3s-master

k3s-worker01

k3s-worker02

k3s-worker03

General node preparation

Provision the k3s master

Provision the k3s worker nodes

K3s storage preparation

Add worker nodes to the cluster

Verify the cluster

Install additional packages

cert-manager

ingress-nginx

Additional section:

Prometheus monitoring

Longhorn storage install

Longhorn Configuration

Accessing the Longhorn UI

Initial configuration

Best practices

Creating additional storage classes

Additional section:

MetalLB

Additional section:

Provision the `k3s` master

Provision the `k3s` worker nodes

`cert-manager`

`ingress-nginx`