This article documents an installation of a self-managed Kubernetes cluster on Hetzner Cloud. The goal was to avoid a managed Kubernetes service and avoid lightweight Kubernetes distributions such as k3s. Instead, the cluster was installed with
kubeadm so that each component is visible and understandable.The final cluster uses one control-plane node and one worker node:
k8s-master 10.0.0.2 control-plane
k8s-worker-1 10.0.0.3 worker
The public IP addresses used during the original installation are intentionally obfuscated in this article.
<MASTER_PUBLIC_IP> public IPv4 of the control-plane node
<WORKER_PUBLIC_IP> public IPv4 of the worker node
<CLIENT_PUBLIC_IP> public IPv4 of the client/laptop used for testing
<YOUR_DOMAIN> example: dev.example.com
<YOUR_EMAIL> email address used for Let's Encrypt
<HCLOUD_TOKEN> Hetzner Cloud API token
<HCLOUD_NETWORK_ID> Hetzner Cloud private network ID
The final setup includes:
CRI-O container runtime
kubeadm cluster bootstrap tool
kubelet Kubernetes node agent
kubectl Kubernetes CLI
Flannel CNI / pod networking
Hetzner Cloud Controller node addresses and cloud routes
Traefik ingress controller
cert-manager TLS certificate management
Let's Encrypt staging first HTTPS validation
whoami test application
This is a learning and development-oriented cluster. It is not highly available, because the control plane has only one node.
Important correction: both nodes need outbound IPv4 access
During this installation, an important issue appeared when the worker node was created without public IPv4.
The worker could join the cluster through the private Hetzner network, but it could not pull required images from registries that only resolved to IPv4 in this environment.
The exact failing image was:
ghcr.io/flannel-io/flannel-cni-plugin:v1.9.1-flannel1
The Flannel pod on the worker failed with an error similar to:
Failed to pull image "ghcr.io/flannel-io/flannel-cni-plugin:v1.9.1-flannel1":
pinging container registry ghcr.io:
Get "https://ghcr.io/v2/":
dial tcp 140.82.121.34:443: connect: network is unreachable
The worker had IPv6, and IPv6 connectivity itself worked, but ghcr.io did not return an IPv6 address in this test:
nslookup -type=AAAA ghcr.io
Result:
*** Can't find ghcr.io: No answer
So for this manual setup, both Hetzner nodes needed outbound IPv4 access:
- the master needed outbound IPv4 to install packages and pull control-plane images
- the worker needed outbound IPv4 to install packages and pull node-level images such as Flannel
Strictly speaking, the requirement is not “public IPv4 on every node”. The real requirement is working outbound IPv4 connectivity. This can be provided by:
- a public IPv4 on each node
- a NAT gateway
- a properly configured NAT instance
- another controlled egress solution
For this learning setup, the practical fix was to assign a public IPv4 to the worker node as well.
What this article covers
This article covers the full path from empty Hetzner servers to a working HTTPS route:
- preparing Hetzner Cloud servers and private networking
- installing CRI-O and Kubernetes packages
- initializing the control plane with
kubeadm - installing Flannel CNI
- installing Hetzner Cloud Controller Manager
- joining a worker node
- fixing worker image pull issues
- restoring the control-plane taint
- installing Traefik without a Hetzner Load Balancer
- installing cert-manager
- creating a Let’s Encrypt staging certificate
- exposing a test application through HTTPS
- understanding internal pod IPs such as
10.244.1.x - preparing for Terraform automation
Why use kubeadm on Hetzner?
Hetzner Cloud does not provide a managed Kubernetes service in the same way that GKE, EKS, AKS, or DOKS do. This makes it a good environment for learning how Kubernetes is assembled from its core components.
Managed Kubernetes hides many details. That is useful in production, but less useful when the goal is to understand what actually runs on every node.
With kubeadm, you are responsible for:
- the operating system preparation
- the container runtime
- the Kubernetes packages
- the control-plane bootstrap
- the CNI plugin
- worker node joins
- cloud provider integration
- ingress
- certificate management
- storage integration
- firewall rules
That makes the installation more manual, but also much more educational.
Final architecture
The cluster has two Hetzner Cloud servers connected to the same private network.
Hetzner private network: 10.0.0.0/16
k8s-master
private IP: 10.0.0.2
public IP: <MASTER_PUBLIC_IP>
role: control-plane
k8s-worker-1
private IP: 10.0.0.3
public IP: <WORKER_PUBLIC_IP>
role: worker
The test HTTPS request path looks like this:
Browser / curl
↓
DNS: whoami.<YOUR_DOMAIN>
↓
<WORKER_PUBLIC_IP>:443
↓
worker hostPort 443
↓
Traefik pod on k8s-worker-1
↓
Kubernetes Service
↓
whoami pod
This setup intentionally does not use a Hetzner Load Balancer. It exposes Traefik through hostPort on the worker node.
A more cloud-native production setup would normally look like this:
Internet
↓
Hetzner Load Balancer
↓
Kubernetes Service type LoadBalancer
↓
Traefik
↓
Application Service
↓
Application Pods
For learning and cost control, this article uses the direct worker hostPort approach.
Versions used
The installation used these versions during the real setup:
| Component | Version or value |
|---|---|
| Kubernetes | v1.36.1 |
| kubeadm config API | kubeadm.k8s.io/v1beta4 |
| CRI-O | 1.33.0 |
| CNI | Flannel |
| Pod CIDR | 10.244.0.0/16 |
| Service CIDR | 10.96.0.0/12 |
| Traefik | v3.7.1 |
| cert-manager | Helm chart installation |
| Test app | traefik/whoami:v1.11 |
For a new installation, always verify the current package versions and chart values before copying commands directly.
Hetzner Cloud preparation
Create a private network in Hetzner Cloud:
Name: k8s-network
IP range: 10.0.0.0/16
Attach the master node and worker node to this network.
Example private IP assignment:
k8s-master 10.0.0.2
k8s-worker-1 10.0.0.3
Create a Hetzner API token with Read & Write permissions. This is required later by the Hetzner Cloud Controller Manager.
The token will be stored in Kubernetes as a Secret:
Namespace: kube-system
Secret: hcloud
Keys: token, network
Server naming
The master was renamed to make Kubernetes node names cleaner:
hostnamectl set-hostname k8s-master
hostname
Expected:
k8s-master
The worker node was named:
k8s-worker-1
This naming makes node roles obvious and scales naturally if more workers are added later:
k8s-master
k8s-worker-1
k8s-worker-2
k8s-worker-3
Install Kubernetes packages on the master
All commands in this section are executed as root on k8s-master.
Set the Kubernetes package version:
export KUBERNETES_VERSION=v1.36
Add the Kubernetes package repository:
curl -fsSL https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/ /" \
> /etc/apt/sources.list.d/kubernetes.list
Add the CRI-O package repository:
curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] \
https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/ /" \
> /etc/apt/sources.list.d/cri-o.list
Install packages:
apt update
apt install -y cri-o kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
Verify:
crio --version
kubelet --version
kubeadm version
kubectl version --client
The installation returned:
crio version 1.33.0
Kubernetes v1.36.1
kubeadm version: v1.36.1
Client Version: v1.36.1
Configure networking on the master
Enable the br_netfilter module:
echo "br_netfilter" >> /etc/modules-load.d/modules.conf
modprobe br_netfilter
Enable IPv4 forwarding:
echo "net.ipv4.ip_forward=1" > /etc/sysctl.d/k8s.conf
sysctl --system
Verify:
sysctl net.ipv4.ip_forward
Expected:
net.ipv4.ip_forward = 1
IPv4 forwarding is required for Kubernetes networking and pod traffic forwarding.
Initialize the control plane
Generate the default kubeadm init configuration:
mkdir -p ~/k8s
cd ~/k8s
kubeadm config print init-defaults > InitConfiguration.yaml
Edit InitConfiguration.yaml.
The important values are:
localAPIEndpoint:
advertiseAddress: 10.0.0.2
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/crio/crio.sock
name: k8s-master
taints: null
kubeletExtraArgs:
- name: cloud-provider
value: external
- name: node-ip
value: 10.0.0.2
For the API server certificate, include the private IP, public IP placeholder, and hostname:
apiServer:
certSANs:
- <MASTER_PUBLIC_IP>
- 10.0.0.2
- k8s-master
Set the pod CIDR for Flannel:
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
Important v1beta4 note: kubeletExtraArgs is a list of name and value objects. A map-style format will fail with an unmarshalling error.
Correct:
kubeletExtraArgs:
- name: cloud-provider
value: external
- name: node-ip
value: 10.0.0.2
Wrong:
kubeletExtraArgs:
cloud-provider: external
node-ip: 10.0.0.2
Initialize Kubernetes:
kubeadm init --config InitConfiguration.yaml
Configure kubectl on the master:
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config
chmod 600 $HOME/.kube/config
Check the node:
kubectl get nodes
At this point the node is expected to be NotReady because the CNI plugin is not installed yet.
Remove temporary taints for the single-node phase
Initially, only the master exists. That means system pods and test workloads need to be able to run on the control-plane node.
Remove the control-plane taint temporarily:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
Because cloud-provider: external is enabled, the node can also have this taint:
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
Remove it temporarily:
kubectl taint nodes --all node.cloudprovider.kubernetes.io/uninitialized-
This is only for the initial single-node phase. After the worker is ready, the control-plane taint should be restored.
Install Flannel CNI
Install Helm:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
Add the Flannel repository:
helm repo add flannel https://flannel-io.github.io/flannel/
helm repo update
Create the namespace:
kubectl create ns kube-flannel
kubectl label --overwrite ns kube-flannel pod-security.kubernetes.io/enforce=privileged
Install Flannel:
helm install flannel flannel/flannel \
--set podCidr=10.244.0.0/16 \
-n kube-flannel
Verify:
kubectl get nodes
kubectl get pods -A
Expected result:
k8s-master Ready control-plane
Flannel should also be running:
kube-flannel kube-flannel-ds-... 1/1 Running
Test the single-node cluster
Create a temporary NGINX deployment:
kubectl create deployment nginx --image=nginx --replicas=1
kubectl get pods
Expected:
nginx-... 1/1 Running
Clean up:
kubectl delete deployment nginx
At this point the single-node Kubernetes cluster works.
Install Hetzner Cloud Controller Manager
Create a Hetzner API token in the Hetzner Cloud Console.
Export it on the master:
export HCLOUD_API_TOKEN=<HCLOUD_TOKEN>
Create the Secret expected by the Hetzner CCM chart:
kubectl -n kube-system create secret generic hcloud \
--from-literal=token=$HCLOUD_API_TOKEN \
--from-literal=network=<HCLOUD_NETWORK_ID>
Add the Helm repository:
helm repo add hcloud https://charts.hetzner.cloud
helm repo update hcloud
Install CCM with networking enabled:
helm install hccm hcloud/hcloud-cloud-controller-manager \
-n kube-system \
--set networking.enabled=true \
--set networking.clusterCIDR=10.244.0.0/16
This networking.enabled=true value was important. Without it, CCM could not properly match the node private IP.
The earlier error looked like this:
Failed to update node addresses for node "k8s-master":
failed to get node address from cloud provider that matches ip: 10.0.0.2
After enabling networking, CCM created cloud routes for the Flannel pod CIDRs.
Verify CCM:
kubectl get pods -n kube-system | grep hcloud
kubectl logs -n kube-system deployment/hcloud-cloud-controller-manager | tail -20
Expected log lines include route creation for node pod CIDRs.
Patch ProviderID on the master
The CCM updated the node external IP, but the ProviderID did not appear automatically in this installation.
Get the server ID from the Hetzner API:
curl -s -H "Authorization: Bearer $HCLOUD_API_TOKEN" \
https://api.hetzner.cloud/v1/servers | python3 -m json.tool | grep -B 2 '"name": "k8s-master"'
Patch the node:
kubectl patch node k8s-master \
--type merge \
-p '{"spec":{"providerID":"hcloud://<MASTER_SERVER_ID>"}}'
Verify:
kubectl describe node k8s-master | grep ProviderID
Expected:
ProviderID: hcloud://<MASTER_SERVER_ID>
Create the worker node
Create another Hetzner Cloud server:
Name: k8s-worker-1
Private IP: 10.0.0.3
OS: Ubuntu
Type: CX23
Important: the worker needs outbound IPv4 connectivity for this setup. Without it, image pulls failed for Flannel.
If you do not want public IPv4 on the worker, you need another outbound IPv4 solution such as NAT. In this installation, the practical solution was to assign a public IPv4 to the worker.
Install packages on the worker
Run on k8s-worker-1 as root:
export KUBERNETES_VERSION=v1.36
curl -fsSL https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/ /" \
> /etc/apt/sources.list.d/kubernetes.list
curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] \
https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/ /" \
> /etc/apt/sources.list.d/cri-o.list
apt update
apt install -y cri-o kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
Verify:
crio --version
kubelet --version
kubeadm version
Enable IP forwarding:
echo "net.ipv4.ip_forward=1" > /etc/sysctl.d/k8s.conf
sysctl --system
sysctl net.ipv4.ip_forward
Expected:
net.ipv4.ip_forward = 1
Generate the join command
On the master:
kubeadm token create --print-join-command
Example format:
kubeadm join 10.0.0.2:6443 \
--token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
The token can be regenerated later, so it does not need to be stored permanently.
Join the worker with JoinConfiguration
On the worker:
kubeadm config print join-defaults > JoinConfiguration.yaml
Edit the file.
Important fields:
apiVersion: kubeadm.k8s.io/v1beta4
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: 10.0.0.2:6443
token: <TOKEN>
caCertHashes:
- sha256:<HASH>
tlsBootstrapToken: <TOKEN>
kind: JoinConfiguration
nodeRegistration:
criSocket: unix:///var/run/crio/crio.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-worker-1
taints: null
kubeletExtraArgs:
- name: cloud-provider
value: external
- name: node-ip
value: 10.0.0.3
Common mistake: caCertHashes must be under discovery.bootstrapToken, not under nodeRegistration.
Join the cluster:
kubeadm join --config JoinConfiguration.yaml
On the master:
kubectl get nodes
Expected initial result:
k8s-master Ready control-plane
k8s-worker-1 NotReady <none>
The worker becomes Ready after Flannel starts successfully on it.
Troubleshooting worker image pulls
The worker stayed NotReady because the Flannel pod could not pull its init container image:
kubectl get pods -n kube-flannel -o wide
Output:
kube-flannel-ds-... 0/1 Init:ImagePullBackOff k8s-worker-1
Describe the pod:
kubectl describe pod <FLANNEL_POD_ON_WORKER> -n kube-flannel | tail -30
The exact failing image was:
ghcr.io/flannel-io/flannel-cni-plugin:v1.9.1-flannel1
The error was:
pinging container registry ghcr.io:
Get "https://ghcr.io/v2/":
dial tcp 140.82.121.34:443: connect: network is unreachable
The worker had no outbound IPv4 route to the internet. IPv6 worked for raw connectivity, but the registry did not provide an IPv6 address in this environment.
This confirmed the practical requirement:
Each node that needs to install packages or pull container images must have working outbound IPv4 connectivity, either through a public IPv4 address or through NAT.
After assigning a public IPv4 to the worker, Flannel pulled successfully and the worker became Ready.
Label the worker node
Label the worker:
kubectl label node k8s-worker-1 node-role.kubernetes.io/worker=worker
Verify:
kubectl get nodes
Expected:
k8s-master Ready control-plane
k8s-worker-1 Ready worker
Patch ProviderID on the worker
Get the server ID:
curl -s -H "Authorization: Bearer $HCLOUD_API_TOKEN" \
https://api.hetzner.cloud/v1/servers | python3 -m json.tool | grep -B 2 '"name": "k8s-worker-1"'
Patch the worker node:
kubectl patch node k8s-worker-1 \
--type merge \
-p '{"spec":{"providerID":"hcloud://<WORKER_SERVER_ID>"}}'
Verify:
kubectl describe node k8s-worker-1 | grep ProviderID
Restore the control-plane taint
Now that the worker is ready, workloads should run on the worker instead of the master.
Add the taint back:
kubectl taint nodes k8s-master node-role.kubernetes.io/control-plane:NoSchedule
Verify:
kubectl describe node k8s-master | grep Taint
Expected:
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Test scheduling:
kubectl create deployment nginx --image=nginx --replicas=1
kubectl get pods -o wide
Expected:
nginx-... Running k8s-worker-1
Clean up:
kubectl delete deployment nginx
Ingress strategy: no Hetzner Load Balancer
A normal cloud-native setup would use a cloud Load Balancer:
Internet
↓
Hetzner Load Balancer
↓
Service type LoadBalancer
↓
Traefik
For this learning setup, the Hetzner Load Balancer was skipped to reduce cost and to understand the mechanics.
Instead, Traefik was exposed through host ports on the worker:
Internet
↓
<WORKER_PUBLIC_IP>:80/443
↓
worker hostPort 80/443
↓
Traefik pod
This is valid for a small dev cluster, but it is less flexible than a cloud Load Balancer.
Install Traefik
Add the Traefik Helm repository:
helm repo add traefik https://helm.traefik.io/traefik
helm repo update
Create traefik-values.yaml.
Final working values:
deployment:
kind: Deployment
replicas: 1
updateStrategy:
type: Recreate
hostNetwork: false
nodeSelector:
node-role.kubernetes.io/worker: worker
service:
enabled: false
ports:
web:
port: 8000
hostPort: 80
expose:
default: false
websecure:
port: 8443
hostPort: 443
expose:
default: false
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
additionalArguments:
- "--log.level=INFO"
Install Traefik:
helm install traefik traefik/traefik \
-f traefik-values.yaml \
-n traefik \
--create-namespace
Verify:
kubectl get pods -n traefik -o wide
Expected:
traefik-... 1/1 Running k8s-worker-1
Why Traefik uses internal ports 8000 and 8443
The first attempt configured Traefik to bind directly to ports 80 and 443 inside the container:
--entryPoints.web.address=:80/tcp
--entryPoints.websecure.address=:443/tcp
This failed:
listen tcp :443: bind: permission denied
Reason: Traefik runs as a non-root user and cannot bind privileged ports below 1024.
The clean fix was to let Traefik listen on high ports inside the pod:
Traefik internal port 8000
Traefik internal port 8443
Then map host ports to those internal ports:
hostPort 80 -> pod port 8000
hostPort 443 -> pod port 8443
This allows public traffic on standard ports while Traefik itself avoids privileged binds.
Why Recreate strategy is required
The default Deployment update strategy is RollingUpdate.
That caused problems with hostPort because Kubernetes tried to start a new Traefik pod before the old one released ports 80 and 443.
The scheduler showed messages like:
node(s) didn't have free ports for the requested pod ports
Fix:
updateStrategy:
type: Recreate
With Recreate, Kubernetes stops the old pod first, then starts the new pod.
This is a better fit for a single Traefik replica using host ports.
Remove Traefik native ACME
Initially, Traefik native Let’s Encrypt was tested. That required:
/data/acme.json
persistent storage
0600 permissions
initContainer for permissions
Later, the setup was changed to cert-manager.
With cert-manager, Traefik no longer needs:
--certificatesresolvers.letsencrypt.acme.email=...
--certificatesresolvers.letsencrypt.acme.storage=/data/acme.json
--certificatesresolvers.letsencrypt.acme.tlschallenge=true
It also no longer needs /data/acme.json persistence for certificates.
Instead, cert-manager stores certificates as Kubernetes Secrets, and Traefik reads those Secrets.
Install cert-manager
Add the Jetstack Helm repository:
helm repo add jetstack https://charts.jetstack.io
helm repo update
Install cert-manager:
helm install cert-manager jetstack/cert-manager \
-n cert-manager \
--create-namespace \
--set crds.enabled=true
Verify:
kubectl get pods -n cert-manager
Expected:
cert-manager-... 1/1 Running
cert-manager-cainjector-... 1/1 Running
cert-manager-webhook-... 1/1 Running
Create Let’s Encrypt staging ClusterIssuer
Use staging first to avoid hitting Let’s Encrypt production rate limits while testing.
Create letsencrypt-staging-clusterissuer.yaml:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
email: <YOUR_EMAIL>
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-staging-account-key
solvers:
- http01:
ingress:
ingressClassName: traefik
Apply it:
kubectl apply -f letsencrypt-staging-clusterissuer.yaml
Verify:
kubectl get clusterissuer
Expected:
letsencrypt-staging True
Check Traefik IngressClass
Check the IngressClass:
kubectl get ingressclass
Expected:
traefik (default) traefik.io/ingress-controller
If it already exists, there is no need to recreate it.
If you apply it manually after Helm created it, you might see this warning:
resource ingressclasses/traefik is missing the kubectl.kubernetes.io/last-applied-configuration annotation
That warning is harmless. It means the object was not originally created with kubectl apply.
DNS records
For the test app, create a DNS record pointing to the worker public IP:
whoami.<YOUR_DOMAIN> -> <WORKER_PUBLIC_IP>
A wildcard record is also useful:
*.dev.example.com -> <WORKER_PUBLIC_IP>
Verify DNS:
nslookup whoami.<YOUR_DOMAIN>
Expected:
Address: <WORKER_PUBLIC_IP>
Deploy the whoami test app
Create whoami.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: whoami
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: traefik/whoami:v1.11
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: whoami
namespace: default
spec:
selector:
app: whoami
ports:
- name: http
port: 80
targetPort: 80
Apply:
kubectl apply -f whoami.yaml
kubectl get pods,svc
Create the Certificate resource
Create whoami-certificate.yaml:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: whoami-dev-example-com
namespace: default
spec:
secretName: whoami-dev-example-com-tls
issuerRef:
name: letsencrypt-staging
kind: ClusterIssuer
dnsNames:
- whoami.<YOUR_DOMAIN>
Apply:
kubectl apply -f whoami-certificate.yaml
Check:
kubectl get certificate
kubectl get certificaterequest
kubectl get order,challenge
Create the Traefik IngressRoute
Create whoami-ingressroute.yaml:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: whoami
namespace: default
spec:
entryPoints:
- websecure
routes:
- match: Host(`whoami.<YOUR_DOMAIN>`)
kind: Rule
services:
- name: whoami
port: 80
tls:
secretName: whoami-dev-example-com-tls
Apply:
kubectl apply -f whoami-ingressroute.yaml
Important difference:
tls:
secretName: whoami-dev-example-com-tls
This means Traefik reads a Kubernetes TLS Secret created by cert-manager.
Do not use this when cert-manager owns certificates:
tls:
certResolver: letsencrypt
certResolver is for Traefik native ACME, not cert-manager.
Test HTTPS
From a laptop:
curl -k https://whoami.<YOUR_DOMAIN>
Expected response:
Hostname: whoami-...
IP: 127.0.0.1
IP: ::1
IP: 10.244.1.x
RemoteAddr: 10.244.1.x:xxxxx
GET / HTTP/1.1
Host: whoami.<YOUR_DOMAIN>
X-Forwarded-For: <CLIENT_PUBLIC_IP>
X-Forwarded-Proto: https
X-Forwarded-Server: traefik-...
X-Real-Ip: <CLIENT_PUBLIC_IP>
Because the certificate comes from Let’s Encrypt staging, browsers will still show a warning. That is expected. The staging CA is intentionally not trusted by browsers.
The important part is that HTTPS routing works and the app receives the request.
Understanding the 10.244.x.x IPs
The whoami output showed IPs like:
IP: 10.244.1.11
RemoteAddr: 10.244.1.10:53998
This is normal.
10.244.1.11 is the Pod IP of the whoami pod.
10.244.1.10 is likely the Pod IP of the Traefik pod.
The real visitor IP is available in headers:
X-Forwarded-For: <CLIENT_PUBLIC_IP>
X-Real-Ip: <CLIENT_PUBLIC_IP>
The 10.244.0.0/16 range comes from the Flannel pod network configured in kubeadm:
networking:
podSubnet: 10.244.0.0/16
Switch to Let’s Encrypt production
After staging works, create a production issuer.
Create letsencrypt-prod-clusterissuer.yaml:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: <YOUR_EMAIL>
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
ingressClassName: traefik
Apply:
kubectl apply -f letsencrypt-prod-clusterissuer.yaml
kubectl get clusterissuer
Patch the Certificate:
kubectl patch certificate whoami-dev-example-com \
-n default \
--type merge \
-p '{"spec":{"issuerRef":{"name":"letsencrypt-prod","kind":"ClusterIssuer"}}}'
Delete the staging TLS Secret so cert-manager requests a new production certificate:
kubectl delete secret whoami-dev-example-com-tls -n default
Watch:
kubectl get certificate,certificaterequest,order,challenge -n default
When ready, test without -k:
curl -I https://whoami.<YOUR_DOMAIN>
Do not repeatedly delete and recreate production certificates because Let’s Encrypt production has rate limits.
Optional: create a Kubernetes user for laptop access
Copying /etc/kubernetes/admin.conf works, but it uses the built-in admin user. A cleaner approach is to create a named user with a client certificate and RBAC.
On the master:
cd ~/k8s
USER_NAME=nikolay
GROUP_NAME=devs
openssl genrsa -out ${USER_NAME}.key 4096
openssl req -new \
-key ${USER_NAME}.key \
-out ${USER_NAME}.csr \
-subj "/CN=${USER_NAME}/O=${GROUP_NAME}"
Create a CSR:
CSR_BASE64=$(cat ${USER_NAME}.csr | base64 | tr -d '\n')
cat <<EOF > ${USER_NAME}-csr.yaml
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: ${USER_NAME}
spec:
request: ${CSR_BASE64}
signerName: kubernetes.io/kube-apiserver-client
expirationSeconds: 31536000
usages:
- client auth
EOF
kubectl apply -f ${USER_NAME}-csr.yaml
kubectl certificate approve ${USER_NAME}
Extract the signed certificate:
kubectl get csr ${USER_NAME} \
-o jsonpath='{.status.certificate}' | base64 -d > ${USER_NAME}.crt
Grant cluster-admin for the learning cluster:
kubectl create clusterrolebinding ${USER_NAME}-cluster-admin \
--clusterrole=cluster-admin \
--user=${USER_NAME}
Create kubeconfig:
CLUSTER_NAME=hetzner-k8s
SERVER=https://<MASTER_PUBLIC_IP>:6443
KUBECONFIG_FILE=${USER_NAME}-kubeconfig.yaml
kubectl config set-cluster ${CLUSTER_NAME} \
--server=${SERVER} \
--certificate-authority=/etc/kubernetes/pki/ca.crt \
--embed-certs=true \
--kubeconfig=${KUBECONFIG_FILE}
kubectl config set-credentials ${USER_NAME} \
--client-certificate=${USER_NAME}.crt \
--client-key=${USER_NAME}.key \
--embed-certs=true \
--kubeconfig=${KUBECONFIG_FILE}
kubectl config set-context ${CLUSTER_NAME}-${USER_NAME} \
--cluster=${CLUSTER_NAME} \
--user=${USER_NAME} \
--namespace=default \
--kubeconfig=${KUBECONFIG_FILE}
kubectl config use-context ${CLUSTER_NAME}-${USER_NAME} \
--kubeconfig=${KUBECONFIG_FILE}
Copy the kubeconfig to the laptop:
scp -i ~/.ssh/my_hetzner \
root@<MASTER_PUBLIC_IP>:/root/k8s/nikolay-kubeconfig.yaml \
~/.kube/hetzner-k8s.yaml
Merge into ~/.kube/config:
cp ~/.kube/config ~/.kube/config.backup.$(date +%Y%m%d-%H%M%S)
KUBECONFIG=~/.kube/config:~/.kube/hetzner-k8s.yaml \
kubectl config view --flatten > /tmp/kubeconfig-merged
mv /tmp/kubeconfig-merged ~/.kube/config
chmod 600 ~/.kube/config
Switch context:
kubectl config get-contexts
kubectl config use-context hetzner-k8s-nikolay
kubectl get nodes
Security notes
At minimum, restrict access to the Kubernetes API server.
The API server runs on:
<MASTER_PUBLIC_IP>:6443
In Hetzner Firewall, allow TCP 6443 only from trusted IPs, such as:
<CLIENT_PUBLIC_IP>/32
Also restrict SSH:
TCP 22 from <CLIENT_PUBLIC_IP>/32
For public web traffic to Traefik on the worker, allow:
TCP 80 from 0.0.0.0/0
TCP 443 from 0.0.0.0/0
For production, consider:
- using a Hetzner Load Balancer
- avoiding direct public worker access
- using a proper NAT gateway for private nodes
- installing Hetzner CSI for persistent volumes
- adding monitoring and backups
- backing up etcd
- using narrower RBAC instead of cluster-admin
Troubleshooting summary
| Problem | Symptom | Cause | Fix |
|---|---|---|---|
| kubeadm config failed | cannot unmarshal object into ... kubeletExtraArgs | v1beta4 expects list format | Use - name, value format |
| Node NotReady | Node registered but not Ready | CNI missing | Install Flannel |
| CCM could not match node | failed to get node address ... 10.0.0.2 | CCM network config missing | Install CCM with networking.enabled=true |
| Flannel failed on worker | Init:ImagePullBackOff | Worker had no outbound IPv4 | Add public IPv4 or NAT |
| Flannel exact image failed | ghcr.io/flannel-io/flannel-cni-plugin:v1.9.1-flannel1 | Registry resolved to IPv4 only | Ensure outbound IPv4 |
| Traefik Pending | PVC unbound | No StorageClass / no PV | Remove native ACME persistence or add PV |
| Traefik bind failed | listen tcp :443: bind: permission denied | Non-root process tried to bind low port | Use internal ports 8000/8443 with hostPort 80/443 |
| New Traefik pod Pending | didn't have free ports | RollingUpdate with hostPort | Use updateStrategy.type: Recreate |
| Browser says Not Secure | Staging certificate | Let’s Encrypt staging CA is not trusted | Switch to production issuer after testing |
| whoami shows 10.244.x.x | Internal pod IPs | Normal Flannel pod networking | Use X-Forwarded-For for client IP |
What is Kubernetes installation and what is platform setup?
The base Kubernetes installation ended when these were working:
kubeadm control plane
CRI-O runtime
Flannel CNI
worker node joined
nodes Ready
workloads scheduling on worker
Traefik and cert-manager are not part of the base Kubernetes installation. They are platform add-ons installed on top of Kubernetes.
A useful separation is:
Base Kubernetes installation
- operating system preparation
- CRI-O
- kubeadm
- kubelet
- kubectl
- Flannel
- worker join
Cloud integration
- Hetzner CCM
- later Hetzner CSI
Platform layer
- Traefik
- cert-manager
- monitoring
- logging
Application layer
- whoami
- real applications
- databases
Next step: Terraform
Now that the manual installation works, Terraform makes sense.
The recommended next step is not to immediately rebuild the cluster. First, import the existing Hetzner resources into Terraform state:
k8s-network
k8s-master
k8s-worker-1
private network attachments
After Terraform can represent the existing infrastructure cleanly, automate a fresh rebuild with:
Terraform
cloud-init
kubeadm init
kubeadm join
Helm charts
This order is better because the manual installation already proved which commands and values work.
Conclusion
This setup created a working self-managed Kubernetes cluster on Hetzner Cloud using kubeadm, CRI-O, Flannel, Hetzner CCM, Traefik, and cert-manager.
The most useful lessons were:
kubeadmgives a clear view of Kubernetes internals.- The CNI must be installed before nodes become Ready.
cloud-provider: externalrequires careful CCM configuration.- Worker nodes need outbound connectivity to pull images.
- In this setup, outbound IPv4 was required because
ghcr.iodid not resolve to IPv6. - Direct
hostPortingress works for development but has trade-offs. - Traefik should listen on high internal ports when running as non-root.
- cert-manager gives a cleaner Kubernetes-native certificate flow than Traefik native ACME file storage.
- The manual setup is now ready to be translated into Terraform automation.
References
- Kubernetes: Installing kubeadm - kubernetes[.]io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
- Kubernetes: Creating a cluster with kubeadm - kubernetes[.]io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
- Kubernetes: kubeadm Configuration v1beta4 - kubernetes[.]io/docs/reference/config-api/kubeadm-config.v1beta4/
- Kubernetes: kubeadm reference - kubernetes[.]io/docs/reference/setup-tools/kubeadm/
- Kubernetes: Certificates and CertificateSigningRequests - kubernetes[.]io/docs/reference/access-authn-authz/certificate-signing-requests/
- Kubernetes: IngressClass - kubernetes[.]io/docs/concepts/services-networking/ingress/#ingress-class
- Kubernetes: Assign Pods to Nodes - kubernetes[.]io/docs/concepts/scheduling-eviction/assign-pod-node/
- CRI-O: Project documentation - cri-o[.]io/
- Flannel: Helm chart repository - github[.]com/flannel-io/flannel
- Hetzner Cloud Controller Manager: Helm chart values - github[.]com/hetznercloud/hcloud-cloud-controller-manager
- Hetzner Cloud Controller Manager: Project repository - github[.]com/hetznercloud/hcloud-cloud-controller-manager
- cert-manager: Helm installation - cert-manager[.]io/docs/installation/helm/
- cert-manager: ACME HTTP-01 solver - cert-manager[.]io/docs/configuration/acme/http01/
- cert-manager: Certificate resource - cert-manager[.]io/docs/usage/certificate/
- Traefik: Helm chart values - github[.]com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml
- Traefik: Kubernetes CRD provider and IngressRoute - doc[.]traefik[.]io/traefik/routing/providers/kubernetes-crd/
- Traefik: cert-manager integration guide - doc[.]traefik[.]io/traefik/user-guides/cert-manager/
- Traefik: whoami test application - github[.]com/traefik/whoami
- Let’s Encrypt: Staging environment - letsencrypt[.]org/docs/staging-environment/
Appendix A: Full command checklist
This checklist is useful when repeating the installation manually. It is not a replacement for understanding the sections above, but it helps to verify that no major step was missed.
Master checklist
# 1. Set hostname
hostnamectl set-hostname k8s-master
# 2. Add repositories
export KUBERNETES_VERSION=v1.36
curl -fsSL https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/ /" \
> /etc/apt/sources.list.d/kubernetes.list
curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] \
https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/ /" \
> /etc/apt/sources.list.d/cri-o.list
# 3. Install packages
apt update
apt install -y cri-o kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
# 4. Configure kernel networking
echo "br_netfilter" >> /etc/modules-load.d/modules.conf
modprobe br_netfilter
echo "net.ipv4.ip_forward=1" > /etc/sysctl.d/k8s.conf
sysctl --system
# 5. Generate kubeadm configuration
mkdir -p ~/k8s
cd ~/k8s
kubeadm config print init-defaults > InitConfiguration.yaml
# 6. Edit InitConfiguration.yaml manually
# - advertiseAddress: 10.0.0.2
# - criSocket: unix:///var/run/crio/crio.sock
# - cloud-provider: external
# - node-ip: 10.0.0.2
# - podSubnet: 10.244.0.0/16
# - certSANs with <MASTER_PUBLIC_IP>, 10.0.0.2, k8s-master
# 7. Initialize cluster
kubeadm init --config InitConfiguration.yaml
# 8. Configure kubectl
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config
chmod 600 $HOME/.kube/config
# 9. Temporarily remove taints for single-node phase
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
kubectl taint nodes --all node.cloudprovider.kubernetes.io/uninitialized-
# 10. Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# 11. Install Flannel
helm repo add flannel https://flannel-io.github.io/flannel/
helm repo update
kubectl create ns kube-flannel
kubectl label --overwrite ns kube-flannel pod-security.kubernetes.io/enforce=privileged
helm install flannel flannel/flannel --set podCidr=10.244.0.0/16 -n kube-flannel
# 12. Install Hetzner CCM
kubectl -n kube-system create secret generic hcloud \
--from-literal=token=$HCLOUD_API_TOKEN \
--from-literal=network=<HCLOUD_NETWORK_ID>
helm repo add hcloud https://charts.hetzner.cloud
helm repo update hcloud
helm install hccm hcloud/hcloud-cloud-controller-manager \
-n kube-system \
--set networking.enabled=true \
--set networking.clusterCIDR=10.244.0.0/16
# 13. Patch master ProviderID if needed
kubectl patch node k8s-master \
--type merge \
-p '{"spec":{"providerID":"hcloud://<MASTER_SERVER_ID>"}}'
Worker checklist
# 1. Set hostname if needed
hostnamectl set-hostname k8s-worker-1
# 2. Make sure the worker has outbound IPv4 connectivity
ping -c 3 8.8.8.8
curl -I https://ghcr.io/v2/
# 3. Add repositories
export KUBERNETES_VERSION=v1.36
curl -fsSL https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/ /" \
> /etc/apt/sources.list.d/kubernetes.list
curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/Release.key | \
gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] \
https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/ /" \
> /etc/apt/sources.list.d/cri-o.list
# 4. Install packages
apt update
apt install -y cri-o kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
# 5. Configure networking
echo "br_netfilter" >> /etc/modules-load.d/modules.conf
modprobe br_netfilter
echo "net.ipv4.ip_forward=1" > /etc/sysctl.d/k8s.conf
sysctl --system
# 6. Generate JoinConfiguration
kubeadm config print join-defaults > JoinConfiguration.yaml
# 7. Edit JoinConfiguration.yaml manually
# - apiServerEndpoint: 10.0.0.2:6443
# - token: <TOKEN>
# - caCertHashes: sha256:<HASH>
# - tlsBootstrapToken: <TOKEN>
# - criSocket: unix:///var/run/crio/crio.sock
# - node-ip: 10.0.0.3
# - name: k8s-worker-1
# 8. Join the cluster
kubeadm join --config JoinConfiguration.yaml
Post-join checklist on the master
kubectl get nodes
kubectl get pods -A
kubectl label node k8s-worker-1 node-role.kubernetes.io/worker=worker
kubectl patch node k8s-worker-1 --type merge -p '{"spec":{"providerID":"hcloud://<WORKER_SERVER_ID>"}}'
kubectl taint nodes k8s-master node-role.kubernetes.io/control-plane:NoSchedule
kubectl create deployment nginx --image=nginx --replicas=1
kubectl get pods -o wide
kubectl delete deployment nginx
Appendix B: Exact validation commands used during the installation
These commands were useful to confirm the state of the cluster during the installation.
Check node addresses
kubectl get nodes -o wide
kubectl get node k8s-master -o jsonpath='{.status.addresses}' | python3 -m json.tool
kubectl get node k8s-worker-1 -o jsonpath='{.status.addresses}' | python3 -m json.tool
Expected style:
[
{
"address": "10.0.0.3",
"type": "InternalIP"
},
{
"address": "k8s-worker-1",
"type": "Hostname"
},
{
"address": "<WORKER_PUBLIC_IP>",
"type": "ExternalIP"
}
]
Check taints
kubectl describe node k8s-master | grep Taint
kubectl describe node k8s-worker-1 | grep Taint
Expected after final setup:
k8s-master: node-role.kubernetes.io/control-plane:NoSchedule
k8s-worker-1: <none>
Check ProviderID
kubectl describe node k8s-master | grep ProviderID
kubectl describe node k8s-worker-1 | grep ProviderID
Expected:
ProviderID: hcloud://<SERVER_ID>
Check Flannel pods
kubectl get pods -n kube-flannel -o wide
Expected:
kube-flannel-ds-... 1/1 Running 10.0.0.2 k8s-master
kube-flannel-ds-... 1/1 Running 10.0.0.3 k8s-worker-1
Check Traefik pod
kubectl get pods -n traefik -o wide
kubectl describe pod -n traefik <TRAEFIK_POD> | grep entryPoints
Expected entry points after the final fix:
--entryPoints.web.address=:8000/tcp
--entryPoints.websecure.address=:8443/tcp
Check cert-manager
kubectl get pods -n cert-manager
kubectl get clusterissuer
kubectl get certificate
kubectl get certificaterequest
kubectl get order,challenge
Expected staging issuer:
letsencrypt-staging True
Appendix C: Why the public IPs are obfuscated
The original installation used real public IP addresses for:
control-plane API and SSH
worker public web traffic
client/laptop testing
Those IPs are not needed for the article and should not be published. The article uses placeholders instead:
<MASTER_PUBLIC_IP>
<WORKER_PUBLIC_IP>
<CLIENT_PUBLIC_IP>
Before publishing, search the final Markdown file for accidental real IP addresses:
grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' self-managed-kubernetes-hetzner-kubeadm-detailed.md
Private Kubernetes and private Hetzner network ranges such as 10.0.0.2, 10.0.0.3, and 10.244.0.0/16 are kept because they explain the cluster networking and are not public identifiers.
Appendix D: Why the worker was not kept IPv6-only
The worker was initially created without public IPv4. This looked attractive because it reduced cost and reduced direct public exposure.
However, the worker still needed to pull images. In this installation, the critical failing image was:
ghcr.io/flannel-io/flannel-cni-plugin:v1.9.1-flannel1
The worker could reach IPv6 endpoints, for example an IPv6 ping test worked. But image pull traffic required the registry hostname to resolve to IPv6. In this environment, ghcr.io returned only IPv4 for the image pull path.
The important distinction is:
IPv6 connectivity exists -> yes
Required registry supports IPv6 -> not enough in this test
Image pull succeeds -> no, because IPv4 egress was missing
That is why the article recommends outbound IPv4 connectivity for both nodes unless a proper NAT or mirror strategy is built.
Alternative solutions could be:
- run a NAT gateway or NAT instance
- use a private registry mirror reachable over the private network
- pre-pull and transfer images manually
- configure a more complete IPv6-capable registry path if all required registries support it
For learning, assigning public IPv4 to the worker was the fastest and clearest fix.
Appendix E: Terraform planning notes
The manual cluster should not be destroyed immediately. The first Terraform step should be to codify the existing Hetzner infrastructure.
Example resources to import:
hcloud_network.k8s
hcloud_network_subnet.k8s
hcloud_server.master
hcloud_server.worker_1
The infrastructure values from this installation map naturally to Terraform variables:
variable "hcloud_token" {
type = string
sensitive = true
}
variable "ssh_key_name" {
type = string
}
variable "network_cidr" {
type = string
default = "10.0.0.0/16"
}
variable "master_private_ip" {
type = string
default = "10.0.0.2"
}
variable "worker_1_private_ip" {
type = string
default = "10.0.0.3"
}
The first goal is a clean terraform plan with no destructive changes. Only after that should the installation be automated through cloud-init, Ansible, or another bootstrap mechanism.
Appendix F: Publish checklist
Before publishing this article, verify:
- all public IPs are replaced with placeholders
- all API tokens are replaced with placeholders
- all kubeadm tokens are replaced with placeholders
- all certificate hashes are placeholders
- the email address is either intentionally public or replaced with
<YOUR_EMAIL> - the domain is either intentionally public or replaced with
<YOUR_DOMAIN> - the commands are grouped by node: master, worker, laptop
- the troubleshooting section includes the exact Flannel image failure
- the article clearly distinguishes nodes from pods
- the article explains that Traefik and cert-manager are platform add-ons, not base Kubernetes installation