#k3s

bkoehn@diaspora.koehn.com

I’ve never done a base #Debian install on a physical machine before yesterday. The mini PC arrived, and I unboxed it and booted it into the pre-installed Windows 11. I used Edge to download the Debian installer ISO, burned that onto a spare USB stick I had lying around, and configured Windows to let me boot off the stick. From there it was a few minutes to configure the device (which included setting up WiFi which went seamlessly!), basically the base install plus sshd.

From there another reboot to confirm everything was working, I transplanted the device to its basement home, and a remote login to configure the wired connection and disable the WiFi. Then a single command to install the #k3s agent and done!

K3s configured itself, attached to the cluster, and started deploying daemonsets (ingress-nginx and nfs-provisioner). I restarted a deployment that had been on the ARM64 node, and it was re-scheduled to run on the new (AMD64) box, downloading the appropriate image and seamlessly running it. I was delighted at how easy it all was.

The whole process took about a half hour, including searching the internet for how to configure networking on Debian.

bkoehn@diaspora.koehn.com

My first machine in my re-homing IT infrastructure was an OrangePi. I created a single-node #k3s cluster and started moving services to it (initially my Diaspora pod and a few others). I tweaked Dockerfiles to create cross-platform images (usually this required no change at all), and by and large it worked great.

I’m looking to add some more nodes, and I discovered the Mini PC category. A node that looks promising is the BosGame B100. For $180 you get a machine with a pretty good CPU (Alder Lake N100), RAM (16GB), and NVMe SSD (512GB). It’s amd64 instead of arm64, but again, my images are all cross-platform and the scheduler can deploy them on whatever is more available.

Adding nodes to the cluster is super-simple, and they need very little administration since they run basically Debian and everything else is behind a container interface.

Migrating home becomes little more than moving the data, updating some DNS entries, and applying the same configuration against the new cluster. So much simpler than the bad old days where everything ran on the underlying OS.

bkoehn@diaspora.koehn.com

So it looks like my eero WiFi/gateway cannot handle the multicast ICMPv6 packets that #k3s is sending out (I still don’t know why it does this, but they’re legit packets). As soon as I firewall off the k3s box, the eero is perfectly stable again. Disable the firewall and the eero can’t keep TCP connections open for more than about a minute. Which is remarkably difficult to work with.

Hopefully eero will fix its issue, but in the meantime I’m back to reliable internet again.

bkoehn@diaspora.koehn.com

I got Diaspora running on #k3s on the #OrangePi5 at home. It’s much, much faster there than at #SSDNodes where it’s currently hosted. I’ll migrate the instance sometime soon.

It’s about 2GB of images and roughly 10GB of data in the database, so there will be some downtime while all that moves.

bkoehn@diaspora.koehn.com

I got Diaspora running on #k3s on the #OrangePi5 at home. It’s much, much faster there than at #SSDNodes where it’s currently hosted. I’ll migrate the instance sometime soon.

It’s about 2GB of images and roughly 10GB of data in the database, so there will be some downtime while all that moves.

bkoehn@diaspora.koehn.com

I started migrating a few workloads to the #OrangePi-based #k3s cluster and it’s so easy it feels like cheating. Even though they run on different CPU architectures, the same images support both amd64 (Intel) and arm64 (ARM), so there’s no change in any of the configuration; it “just works.”

And the OrangePi is so much faster than the shitty SSDNodes VMs (which are way over-provisioned). So it “just works” better.

bkoehn@diaspora.koehn.com

Got an #OrangePi (16GB RAM, not ETA on the 32GB variant). Migrated the OS and boot stuff to the NVME volume from the microSD card. Now to get #k3s running and configured and I can move some workloads over to it.

bkoehn@diaspora.koehn.com

I finally upgraded #k3s on the cluster today, and it was so painless it felt like cheating. Download the new binary, move it in place, and restart the service. In #Kubernetes this takes an hour with endless plans and scripts and draining and reloading everything. In k3s it took five minutes.

bkoehn@diaspora.koehn.com

This morning I finally got around to fixing an issue that mysteriously came up during the migration to the new #k3s cluster where the pod is now hosted. For some reason the pod wasn’t serving static assets correctly, and as a workaround I was sending them instead to the Diaspora Ruby code rather than the lightttpd server that’s both faster and non-blocking.

In any case, a bit of experimenting fixed the issue and it’s all better now; there should be fewer pauses in loading pages as a result.

bkoehn@diaspora.koehn.com

Devoted some time to continue to tear down my #Kubernetes #k8s infrastructure at #Hetzner and move it to my #k3s infrastructure at #ssdnodes. It's pretty easy to move everything, the actual work involving moving files and databases and a bit of downtime. As I relieve the old infrastructure I can save some money by shutting down nodes as the workload decreases. I've shut down two nodes so far. Might free up another tonight if I can move #Synapse and Diaspora.

bkoehn@diaspora.koehn.com

After a few hours of work, I have high-availability storage on my #k3s #Kubernetes cluster.

Running on bare Ubuntu VMs, each of the three servers has 48GB of RAM and 720GB of SSD storage. The provider I'm using doesn't supply extra SAN storage, so the on-VM storage is all I have, and any redundancy I have to handle myself.

Enter Longhorn. Longhorn is a FOSS project from Rancher that allows you to use local storage inside your Kubernetes cluster, and keeps replicas available on other nodes in case one of your servers is unavailable. The system is trivial to set up and highly efficient, and acts as a StorageClass that you can use when requesting storage for a pod. It can also schedule snapshots and backups to an offsite S3 instance for additional safety. It even has experimental support for shared volumes via NFS!

For object storage I've configured a modern Minio cluster. Minio is an FOSS S3-compatible server that also uses local storage, keeping multiple instances around for high availability. It's also quite easy to configure and use, with an incredibly rich feature set and a lovely UI. It doesn't have its own backups, but it's easy to replicate with a simple cron job.

I'm slowly moving workloads over to the new pod, and will migrate the Diaspora pod in a few days (expect an hour or so of downtime during the migration). The new cluster is more secure, more stable, and is much less likely to go down than the old one was.

bkoehn@diaspora.koehn.com

Last night I installed the new #Canal #CNI (#Calico + #Flannel) on the new #k3s #Kubernetes cluster in the same way I've always done it on the old #k8s cluster, neglecting the clear instructions to apply any changes from the original configuration to the new one. Those changes included little things like telling Flannel which interface to use, what IP range to allocate, and other trivialities. Wow did I blow that cluster to bits. Following the directions and deleting a few very confused pods fixed the issue.

Anyway, it's working now, and I have a better process in place to manage CNI upgrades.

bkoehn@diaspora.koehn.com

Alright, after a bit more puttering about I've got my #k3s #Kubernetes cluster networking working. Details follow.

From an inbound perspective, all the nodes in the cluster are completely unavailable from the internet, firewalled off using #hetzner's firewalls. This provides some reassurance that they're tougher to hack, and makes it harder for me to mess up the configuration. All the nodes are on a private network that allows them to communicate with one another, and that's their exclusive form of communication. All the nodes are allowed any outbound traffic. The servers are labeled in Hetzner's console to automatically apply firewall rules.

In front of the cluster is a Hetzner firewall that is configured to forward public internet traffic to the nodes on the private network (meaning the load balancer has public IPv4 and IPv6 addresses, and a private IPv4 address that it uses to communicate with the worker nodes). The load balancer does liveness checks on each node and can prevent non responsive nodes from receiving requests. The load balancer uses the PROXY protocol to preserve source #IP information. The same Hetzner server labels are used to add worker nodes to the load balancer automatically.

The traffic is forwarded to an #nginx Daemonset which k3s keeps running on every node in the cluster (for high availability), and the pods of that DaemonSet keep themselves in sync using a ConfigMap that allows tweaks to the nginx configuration to be applied automatically. Nginx listens on the node's private IP ports and handles #TLS termination for #HTTP traffic and works with Cert-Manager to maintain TLS certificates for websites using #LetsEncrypt for signing. TLS termination for #IMAP and #SMTP are handled by #Dovecot and #Postfix, respectively. Nginx forwards (mostly) cleartext to the appropriate service to handle the request using Kubernetes Ingress resources to bind ports, hosts, paths, etc. to the correct workloads.

The cluster uses #Canal as a #CNI to handle pod-to-pod networking. Canal is a hybrid of Calico and Flannel that is both easy to set up (basically a single YAML) and powerful to use, allowing me to set network policies to only permit pods to communicate with the other pods that they need, effectively acting as an internal firewall in case a pod is compromised. All pod communication is managed using standard Kubernetes Services, which behind the scenes simply create #IPCHAINS to move traffic to the correct pod.

The configuration of all this was a fair amount of effort, owing to Kubernetes' inherent flexibility in the kinds of environments it supports. But by integrating it with the capabilities that Hetzner provides I can fairly easily create an environment for running workloads that's redundant and highly secure. I had to turn off several k3s "features" to get it to work, disabling #Traefik, #Flannel, some strange load balancing capabilities, and forcing k3s to use only the private network rather than a public one. Still, it's been easier to work with than a full-blown Kubernetes installation, and uses considerably fewer server resources.

Next up: storage! Postgres, Objects, and filesystems.

bkoehn@diaspora.koehn.com

The new #kubernetes cluster is coming along. I have the networking figured out the way I want it, all on a private network away from prying eyes. Today I got the Ingress working with a hardware load balancer, and as soon as I get the certificate manager installed I can start moving some workloads. Then I’ll add Ceph and Stolon for HA file, object, and database and I can move nearly everything to the newer environment.

Learning #k3s has been interesting and not too involved. Most things work easily once you do a bit of research, and it’s lighter weight and easier to debug than full Kubernetes. It solves a challenging problem (how do I automate distributing, scheduling, and monitoring a diverse workload over a variety of nodes for security and availability). But it works well and the abstractions are stable.

bkoehn@diaspora.koehn.com

Decided to spin up a local k3s cluster running on my (ARM64) laptop. Another interesting bit about the Docker environment is how easy it is to migrate configurations across platforms.

I'll add that spinning up a cluster in k3s is just running a single command per node; one for the master node and one for each of the server nodes. It's trivial to automate and completes in seconds.

Now I'm messing around with #ceph for managing high-availability #storage (filesystem and #s3) and #stolon for high-availability #postgres.

#docker #kubernetes #k3s #k8s #arm64 #buildx #ha