Hermes in My Homelab: Living With a Personal AI Agent

I have more ideas than time. My side-project backlog is full of things I genuinely want to explore: “Is a Crossplane Composition plus function-sequencer really equivalent to a hand-written Kubebuilder controller?”, “Can Temporal reliably drive a Deep-Agent loop?” and more recently “Can TigerData run on CNPG without a custom operator?” In theory, these just need a focused weekend. In reality, between work, chasing after my kids in playground, and sometime my wife need a husband as well, my window for “sitting at a desk to explore” is maybe two evenings a week. By the time I sit down, half of that time is spent just paying down the context I’ve lost since the last session.

At the same time, background agents finally got both good and safe enough to trust at home. I tried OpenClaw back in February, but shut it down after two weeks. A University of Toronto advisory highlighted exactly the kind of sandbox-escape nightmare I was worried about, and I simply don’t have the time to act as a security researcher for someone else’s runtime.

Then Hermes showed up. It had a real defence-in-depth security model and a memory-and-skills loop that actually compounds over time. Safe by construction, smarter with use. It felt like the first agent that could plausibly belong on my own hardware.

The Setup

There are 3 constraints dictated how I built this out. The hardware was already there. I finished building my Proxmox + Talos setup back in December (a three-node cluster heavily inspired by Jeff Gerling’s minirack design). It runs Cilium, Traefik, and my GitOps stack. It’s beautiful, but it was sitting idle and wasting electricity. I wanted a workload that justified it beyond just another room heater. I’d like to have a setup where security was non-negotiable. I refuse to host an agent with shell access that is one bad prompt away from nmap‘ing my home network. Lastly, I don’t want to spend additional money and buy a dedicated Mac mini (strangely a lot of people did just that). The homelab is already a sunk cost, making everything on top of it effectively free.

I ended up with a dedicated Hermes VM on the homelab, locked to its own subnet, with read-write access to the cluster and strictly accessible to me.

Hermes runs on a dedicated VM on the Proxmox host, attached to an isolated dmz bridge (10.10.10.0/24) that my Talos cluster nodes sit on. Hermes is a client of the cluster.

flowchart TB
    subgraph User["👤 Me"]
      Phone["Slack on phone"]
      Laptop["Slack / terminal on laptop"]
    end

    subgraph Homelab["🏠 Proxmox homelab"]
      direction TB
      subgraph DMZ["dmz bridge — 10.10.10.0/24"]
        HermesVM["Hermes VM<br/>(runtime, memory, skills, cron)"]
        subgraph K8s["Talos cluster"]
          Apps["GitOps apps:<br/>Argo CD · Traefik · CrowdSec ·<br/>VictoriaMetrics/Logs · Grafana · Hubble ·<br/>SeaweedFS · CNPG · LiteLLM · Lemuria"]
        end
      end
      OPN["OPNsense VM (firewall)"]
    end

    Phone -->|messages| Slack["Slack workspace"]
    Laptop -->|messages| Slack
    Slack <-->|outbound websocket| HermesVM
    HermesVM -->|kubectl / gh / git| K8s
    HermesVM -->|spawns| CC["Claude Code workers<br/>(tmux + claude -p)"]
    CC -->|push branches / open PRs| GH["GitHub"]
    OPN --- DMZ

I setup Slack integration so that I can access is on the go. Slack becomes the front door and the Talos cluster is the workshop. I rarely open a laptop to start tinkering now. I message Hermes, it does the heavy lifting, and the result lands in the right repo or PR.

Because Hermes has live kubectl access against the homelab, when a PoC needs a real database, Ingress, or CRD, it brings them up against real infrastructure rather than mocks. That’s allowed the agent to independently execute the task, verify, and fix it in relatively safe manner.

The agent relies heavily on two pieces of state:

Memory: About 2 KB of declarative facts injected into every turn. This stops me from having to re-explain my preferences.
Skills: Around 150 reusable procedures stored as markdown and FTS5-indexed. The agent searches its own playbook before attempting anything complex. When we solve something hard together, Hermes writes a new skill to cache that knowledge.

Both live in the VM’s file system and are version-controlled in a separate repo, ensuring the whole personality remains reproducible (minus the secrets, of course).

I’ve made two local tweaks that make this setup feel like mine:

First, skill discovery via skills_search. By default, every skill’s name and description gets injected into the system prompt. At 150+ skills, that eats up a massive chunk of context budget. I maintain a customization of hermes agent in my branch that swaps this for an on-demand index. The agent searches for candidates, then loads what it needs. I have to maintain it, but every chat turn is cheaper, and the library can grow infinitely.

Second, an automated upgrade pipeline. A weekly cron checks for new stable tags. If one exists, it stages a rebase in a throwaway worktree, runs smoke tests, and pings me in Slack. I reply apply, and it tags a rollback snapshot, fast-forwards, and restarts the gateway with about five seconds of downtime. Few months in, I’ve had to roll back exactly once and that was due to my own botched merge.

For writing actual code, Hermes spawns Claude Code workers. Hermes plans the work, Claude Code executes it inside a worktree, and the results come back as diffs and PR links. Splitting the brain (a planner with memory and tools) from the hands (an executor with a real IDE-grade harness) is easily the highest-leverage pattern in this entire setup.

Hermes also runs on a schedule. It handles a weekly arXiv and video digest (filtered down to just a few seminal pieces), summarize them, and create a PR to my blog repository. It helped me to quickly catch up on what’s going on in the industry on that particular week without having to watch hours long video.

Keeping Hermes out of my home network

Handing an AI agent a shell on your home Wi-Fi is a disaster waiting to happen. If you don’t think about boundaries up front, a poisoned prompt could expose everything on your network.

flowchart TB
    subgraph Home["🏠 Home LAN — 192.168.50.0/24"]
      Family["Family devices"]
      ASUS["ASUS GT-BE98 router"]
    end

    subgraph Lab["🧪 Proxmox dmz bridge — 10.10.10.0/24"]
      OPN["OPNsense VM<br/>WAN + firewall + NAT"]
      HermesVM["Hermes VM"]
      Talos["Talos cluster<br/>(no SSH, no shell)"]
    end

    subgraph Internet["🌐 Internet"]
      Slack["Slack websocket<br/>(outbound only)"]
      LLM["LLM APIs"]
      GH["GitHub"]
    end

    HermesVM -- egress via NAT --> OPN
    OPN -- allow --> Slack
    OPN -- allow --> LLM
    OPN -- allow --> GH
    OPN -. default DROP .-> Home
    Family -. no path to Hermes .-> HermesVM
    ASUS -. no inbound port-forwards .-> OPN

Here’s how I keep it locked down:

L2 Isolation: The dmz bridge is completely separate from the home Wi-Fi.
OPNsense Default-Drop: There is no allow rule from the DMZ to the home LAN. Hermes can reach the internet and the cluster, but it cannot see my NAS, the printer, or the kids’ tablets.
Outbound Only: Slack communicates via an outbound websocket. There are no listening services exposed to the public internet.

My mental model is simple: Hermes is a handyman I let into a designated workshop. It has tools and an outside line, but it does not have keys to the house.

The Payoff: A Saturday at the Playground

The clearest test of this system isn’t a desk hackathon, it’s a Saturday afternoon at the playground. Can I ship anything meaningful between pushes on the swing? Surprisingly, yes. For example: I woke up this morning wondering whether TigerData plays nicely with CloudNativePG. Today, before leaving for lunch, I sent one Slack message:

“Hermes — spec a PoC under pradithya/poc/tigerdata-cnpg proving TigerData on CNPG inside k3d, with a small FastAPI dashboard hitting a hypertable. Use docs-iterate-and-review. Open the PR when QA passes.”

A single Slack message to Hermes kicks off the whole PoC — and 131 threaded replies later, the work is done.

~15 minutes later, the PR landed in Slack. It had the CNPG Cluster CR pinning the TigerData image, a FastAPI app proving the time-bucket logic works, a Makefile, and a README.

The resulting PR: feat(tigerdata-cnpg) — CNPG + TigerData PoC with a FastAPI demo, complete with summary and test plan, generated end-to-end by Claude Code.

The FastAPI dashboard even renders live — a Chart.js view of the hypertable updating every couple of seconds as the simulator ingests synthetic device readings:

The FastAPI dashboard streaming time-bucketed readings from the TigerData hypertable in real time.

The answer to a real engineering question landed in my repo while I was being a present parent. Six months ago, that PoC would have cost me a Saturday at the desk—and realistically, it probably just wouldn’t have happened at all. The “I’ll explore this later” backlog is no longer a graveyard.

This works because Slack is the only UI I need, the live-cluster smoke tests mean I can actually trust the code. The minutes I spend parenting are the minutes Claude Code spends compiling. My attention is strictly reserved for PR reviews.

I hate “10x productivity” marketing fluff, so here is the concrete reality:

My weekends are no longer tethered to a desk. Code generation happens concurrently while I do other activities. The true multiplier here isn’t that the agent is lightning fast. It’s the asynchronous nature of it.

What hasn’t changed is the review bottleneck. Reading PRs remains the constraint, and the final “do I trust this code?” call is unequivocally mine. I expect it always will be.

The Setup#

Keeping Hermes out of my home network#

The Payoff: A Saturday at the Playground#

The Setup

Keeping Hermes out of my home network

The Payoff: A Saturday at the Playground