#kubernetes

bkoehn@diaspora.koehn.com

Devoted some time to continue to tear down my #Kubernetes #k8s infrastructure at #Hetzner and move it to my #k3s infrastructure at #ssdnodes. It's pretty easy to move everything, the actual work involving moving files and databases and a bit of downtime. As I relieve the old infrastructure I can save some money by shutting down nodes as the workload decreases. I've shut down two nodes so far. Might free up another tonight if I can move #Synapse and Diaspora.

bkoehn@diaspora.koehn.com

After a few hours of work, I have high-availability storage on my #k3s #Kubernetes cluster.

Running on bare Ubuntu VMs, each of the three servers has 48GB of RAM and 720GB of SSD storage. The provider I'm using doesn't supply extra SAN storage, so the on-VM storage is all I have, and any redundancy I have to handle myself.

Enter Longhorn. Longhorn is a FOSS project from Rancher that allows you to use local storage inside your Kubernetes cluster, and keeps replicas available on other nodes in case one of your servers is unavailable. The system is trivial to set up and highly efficient, and acts as a StorageClass that you can use when requesting storage for a pod. It can also schedule snapshots and backups to an offsite S3 instance for additional safety. It even has experimental support for shared volumes via NFS!

For object storage I've configured a modern Minio cluster. Minio is an FOSS S3-compatible server that also uses local storage, keeping multiple instances around for high availability. It's also quite easy to configure and use, with an incredibly rich feature set and a lovely UI. It doesn't have its own backups, but it's easy to replicate with a simple cron job.

I'm slowly moving workloads over to the new pod, and will migrate the Diaspora pod in a few days (expect an hour or so of downtime during the migration). The new cluster is more secure, more stable, and is much less likely to go down than the old one was.

bkoehn@diaspora.koehn.com

Last night I installed the new #Canal #CNI (#Calico + #Flannel) on the new #k3s #Kubernetes cluster in the same way I've always done it on the old #k8s cluster, neglecting the clear instructions to apply any changes from the original configuration to the new one. Those changes included little things like telling Flannel which interface to use, what IP range to allocate, and other trivialities. Wow did I blow that cluster to bits. Following the directions and deleting a few very confused pods fixed the issue.

Anyway, it's working now, and I have a better process in place to manage CNI upgrades.

bkoehn@diaspora.koehn.com

Alright, after a bit more puttering about I've got my #k3s #Kubernetes cluster networking working. Details follow.

From an inbound perspective, all the nodes in the cluster are completely unavailable from the internet, firewalled off using #hetzner's firewalls. This provides some reassurance that they're tougher to hack, and makes it harder for me to mess up the configuration. All the nodes are on a private network that allows them to communicate with one another, and that's their exclusive form of communication. All the nodes are allowed any outbound traffic. The servers are labeled in Hetzner's console to automatically apply firewall rules.

In front of the cluster is a Hetzner firewall that is configured to forward public internet traffic to the nodes on the private network (meaning the load balancer has public IPv4 and IPv6 addresses, and a private IPv4 address that it uses to communicate with the worker nodes). The load balancer does liveness checks on each node and can prevent non responsive nodes from receiving requests. The load balancer uses the PROXY protocol to preserve source #IP information. The same Hetzner server labels are used to add worker nodes to the load balancer automatically.

The traffic is forwarded to an #nginx Daemonset which k3s keeps running on every node in the cluster (for high availability), and the pods of that DaemonSet keep themselves in sync using a ConfigMap that allows tweaks to the nginx configuration to be applied automatically. Nginx listens on the node's private IP ports and handles #TLS termination for #HTTP traffic and works with Cert-Manager to maintain TLS certificates for websites using #LetsEncrypt for signing. TLS termination for #IMAP and #SMTP are handled by #Dovecot and #Postfix, respectively. Nginx forwards (mostly) cleartext to the appropriate service to handle the request using Kubernetes Ingress resources to bind ports, hosts, paths, etc. to the correct workloads.

The cluster uses #Canal as a #CNI to handle pod-to-pod networking. Canal is a hybrid of Calico and Flannel that is both easy to set up (basically a single YAML) and powerful to use, allowing me to set network policies to only permit pods to communicate with the other pods that they need, effectively acting as an internal firewall in case a pod is compromised. All pod communication is managed using standard Kubernetes Services, which behind the scenes simply create #IPCHAINS to move traffic to the correct pod.

The configuration of all this was a fair amount of effort, owing to Kubernetes' inherent flexibility in the kinds of environments it supports. But by integrating it with the capabilities that Hetzner provides I can fairly easily create an environment for running workloads that's redundant and highly secure. I had to turn off several k3s "features" to get it to work, disabling #Traefik, #Flannel, some strange load balancing capabilities, and forcing k3s to use only the private network rather than a public one. Still, it's been easier to work with than a full-blown Kubernetes installation, and uses considerably fewer server resources.

Next up: storage! Postgres, Objects, and filesystems.

bkoehn@diaspora.koehn.com

The new #kubernetes cluster is coming along. I have the networking figured out the way I want it, all on a private network away from prying eyes. Today I got the Ingress working with a hardware load balancer, and as soon as I get the certificate manager installed I can start moving some workloads. Then I’ll add Ceph and Stolon for HA file, object, and database and I can move nearly everything to the newer environment.

Learning #k3s has been interesting and not too involved. Most things work easily once you do a bit of research, and it’s lighter weight and easier to debug than full Kubernetes. It solves a challenging problem (how do I automate distributing, scheduling, and monitoring a diverse workload over a variety of nodes for security and availability). But it works well and the abstractions are stable.

bkoehn@diaspora.koehn.com

Decided to spin up a local k3s cluster running on my (ARM64) laptop. Another interesting bit about the Docker environment is how easy it is to migrate configurations across platforms.

I'll add that spinning up a cluster in k3s is just running a single command per node; one for the master node and one for each of the server nodes. It's trivial to automate and completes in seconds.

Now I'm messing around with #ceph for managing high-availability #storage (filesystem and #s3) and #stolon for high-availability #postgres.

#docker #kubernetes #k3s #k8s #arm64 #buildx #ha

bkoehn@diaspora.koehn.com

I'm messing about with spinning up a new #Kubernetes cluster running a more current version of k8s, with better, faster, more secure networking (using a private network and Calico's built-in wireguard encryption support). I'll likely migrate the pod and other workloads to the new cluster over coming months.

Wish me luck.

bkoehn@diaspora.koehn.com

Last week one of the nodes in my #Kubernetes cluster failed due to an issue on the bare metal machine. #Hetzner fixed the problem reasonably quickly, but in the meantime Kubernetes noticed it, moved the workloads to other servers, and kept right on running. When the machine was repaired, it notified K8S that it was available again, and resumed processing work.

Which is good because I was on vacation.