For a long time, I wanted to build a robust home lab to experiment with Kubernetes and various DevOps tools. However, between a demanding job and the joys (and time commitments) of raising two kids, this project remained on the back burner. The thought of spending weeks configuring servers, networking, and applications was simply too daunting.
The hardware journey actually began back in November. Inspired by Jeff Geerling’s famous “minirack” builds, I decided to construct a similar compact setup. Living in Singapore, where residential space is at a premium, this form factor is invaluable. It allows me to pack compute power into a tiny footprint without turning my apartment into a data center.
While the physical build was ready, the software configuration sat idle until the new year holiday. I finally had some downtime and wanted to put my new Claude Max subscription to the test. With the incredible assistance offered by AI coding tools like Claude and Gemini, what normally would take days, I was able to accomplish in a single day (not counting how I failed to setup physical VLAN using the ASUS router). This blog post details the journey of building my Proxmox-based home lab and how AI made it all possible.
The Infrastructure
The foundation of the lab is a three-node Proxmox cluster, housed in that custom minirack. I chose a multi-node setup to mimic a real-world production environment, allowing me to test high-availability scenarios and drain nodes for maintenance without downtime.
The first challenge was networking. My home runs on a standard 192.168.50.0/24 network managed by an ASUS GT-BE98 router. I didn’t want my messy lab experiments colliding with the family Wi-Fi, so I needed strong isolation. I tried to setup VLAN but gave up after tweaking the ASUS router to make it work, so I opted for a Proxmox VXLAN overlay (10.10.10.0/24). This acts like a virtual network floating on top of my physical cables, giving me a clean, isolated sandbox.
To manage this sandbox, I deployed an OPNsense VM. It acts as the gatekeeper, sitting between the physical world and my virtual lab. It handles all the routing, firewall rules, and NAT, ensuring that my lab is secure and self-contained (finger-cross). Of course, a secure lab is useless if you can’t reach it, so I deployed a dedicated Tailscale VM. This creates a secure mesh network, allowing me to access my cluster from anywhere as if I were sitting right next to the rack, without ever opening dangerous ports to the public internet.
With the plumbing in place, it was time for the compute layer. For the Kubernetes nodes, I didn’t want the overhead of managing full Linux distributions. I chose Talos Linux, a modern, immutable OS built specifically for Kubernetes. It’s incredibly secure because it has no SSH, no console, and a read-only file system, everything is managed via API. This reduces the attack surface drastically and makes the cluster almost maintenance-free.
Running on top of Talos is the Kubernetes cluster itself. To direct traffic, I set up a dual Traefik Ingress system. One instance handles public-facing apps like my uptime monitors (so I know when I broke something), while a second, separate instance handles sensitive internal tools like Grafana and Hubble. This separation ensures that even if the public layer is targeted, my management interfaces remain locked away.
Under the hood, Cilium handles the container networking using eBPF, providing high-performance connectivity and deep observability. To keep an eye on everything, I spun up a full observability stack: Prometheus for metrics, Loki for logs, and Grafana to visualize it all. Finally, to keep the bad guys out, CrowdSec watches the logs for intrusion attempts, and Cert-manager automates all the TLS certificates, because life is too short to manually renew SSL keys.
Here’s Hubble in action, providing real-time visibility into network flows and service dependencies:
And here’s a glimpse of the Grafana dashboards monitoring the cluster:
The moment you expose any service to the public internet, you become a target. It doesn’t matter if you’re running a Fortune 500 company or a tiny home lab in Singapore, automated bots and malicious actors are constantly scanning for vulnerabilities. This is where CrowdSec becomes essential.
CrowdSec is an open-source security engine that analyzes logs in real-time to detect and block malicious behavior. What makes it particularly powerful is its crowd-sourced threat intelligence: when one CrowdSec user detects an attacker, that information is shared with the entire community, providing protection before the attacker even reaches your doorstep.
Within the first few days of my homelab going live, the results were eye-opening. Here’s a recording showing the attacks my setup experienced:
The sheer volume of automated attacks, from SSH brute-force attempts to web vulnerability scanners probing for exposed admin panels was a stark reminder of why defense-in-depth matters. CrowdSec, integrated with my Traefik ingress, automatically blocks these malicious IPs at the edge before they can cause any harm. It’s like having a bouncer at the door who knows every troublemaker in town.
This entire setup was provisioned and configured using Terraform. Terraform allows me to define my infrastructure as code, making it reliable, repeatable, and version-controlled. I can easily spin up a new environment, make changes, and roll back if needed.
The Terraform configuration is broken down into layers:
Layer 1 (Bootstrap): Sets up the base infrastructure, including Proxmox VMs, OPNsense, and networking.
Layer 2 (Cluster): Provisions the Talos Kubernetes cluster.
Layer 3 (Add-ons): Deploys core Kubernetes add-ons like Cilium, Traefik, Cert-manager, and the observability stack.
Layer 4 (Network): Configures external networking components like Cloudflare DNS and Gateway API.
This layered approach makes the Terraform configuration modular and easy to manage.
Furthermore, the assistance of AI coding tools like Claude and Gemini was invaluable. They helped me:
Generate Terraform code: I could describe what I wanted to build, and the AI would generate the initial Terraform configuration.
Troubleshoot issues: When I encountered errors, I could paste the error message and the code into the AI, and it would provide suggestions and fixes.
Optimize configuration: The AI helped me identify and implement best practices for Terraform and Kubernetes configuration.
To see the real-world result of this setup, I invite you to check out a live application currently running on the cluster: helm-renderer.utilitytools.app.
This is a custom utility that allows you to render Helm charts directly in your browser and perform static analysis on the output. It stands as a perfect testament to the “speed-to-value” I experienced over the weekend, not only is it hosted on the infrastructure I just built, but the application itself was written 100% by Claude Code. It demonstrates how modern AI tools can bridge the gap between complex infrastructure and deployed software in record time. I’d love for you to give it a spin!