Sierra powers customer-experience voice agents for a large chunk of the Fortune 20, and in this Interrupt-26 conversation Zach Reneau-Wedeen (Head of Product) walks through what their production agent harness actually looks like. The headline: a voice agent in production does not look like the canonical “LLM-in-a-loop calling tools” diagram everyone draws on whiteboards. It looks like a multi-model ensemble pipeline with speculative execution baked in.
“Coding agents are good at file systems — let’s materialize everything into a file system”
The opening framing is a useful contrarian take: coding agents have a runaway lead on capability because they happen to operate on substrates — file systems, Git, grep — that the underlying models were already extremely good at. Sierra’s design bet is to materialize as much agent state as possible into those same primitives so the same model capability can be applied to non-coding domains.
“Coding agents are really good at file systems. They’re really good at Git. They’re really good at grep. Let’s materialize everything into those structures so that coding agents can just, you know, cook.”
Journeys: a constrained DSL rather than free-form planning
Sierra’s “Journeys” abstraction is a DSL, not pure raw model output. The pitch: you want enough structure that the agent’s behavior is auditable and constrained per-customer, but enough room left for the model to reason within each step. The tradeoff curve here — how much to bake into the harness vs. how much to leave to the model — is the central design question for any production agent team in 2026.
Voice as a “thick architecture”
This is the most actionable section of the interview. Sierra’s voice stack is explicitly modular — separate providers for transcription (STT), reasoning (LLM), synthesis (TTS), and now native voice-to-voice models — and they multihome each one. The reasoning: provider quality is non-uniform across accents, languages, and contexts, and the only way to get the best customer experience for, say, “a thick northern UK accent” is to route per-call.
The concrete example Zach gives is gorgeous:
“There is one model that has the highest quality transcription, but it hallucinates during silence more than other models. So we run two models in parallel — if this model says it’s silent, you trust it; if this model does not say it’s silent, you trust the other one.”
That’s a one-paragraph design pattern that probably belongs in every voice-agent codebase: an ensemble with asymmetric trust rules between the members.
Speculative execution everywhere
The same speculative-execution idea that CPUs use shows up at the agent level. Sierra often looks up the answer to a question before deciding whether the user actually wants the answer. Classification and response generation run in parallel; the harness reconciles after. The cost of a wasted lookup is far less than the latency cost of a serial pipeline on a phone call where the customer is listening to silence.
File-system-shaped state for non-coding agents
Pushing the opening thesis further: Sierra has been steadily moving more of its agent state — knowledge, journeys, configurations — into file-system-shaped structures, on the theory that the next generation of base models will be even better at code-and-files than today’s. Treating the harness as a thin layer over a substrate the model already loves is a different bet than building a bespoke planner.
Agent orchestration: the orchestrator is not the model
When asked what’s running in parallel — guardrails, retrievals, classification — Zach’s answer is essentially all of it. The orchestrator is not the model; the orchestrator is application code that fans out to many models (often the same prompt to different providers) and merges results with explicit reconciliation rules. Production agent harnesses are starting to look a lot more like high-frequency-trading dispatch layers than like LangChain demos.
Agentic commerce as the next leg
The forward-looking note: Sierra is already in pilots where the agent earns a commission on a completed sale rather than being charged per minute or per resolution. Zach’s bet — paraphrased — is that agentic commerce will end up larger than e-commerce, because the agent is a higher-context salesperson than a search box and can close a deal in a single turn.
Key takeaways
- Production voice agents are multi-model ensembles, not single-LLM loops. Multihome STT, LLM, TTS, and V2V; route per call.
- Asymmetric trust between ensemble members is the right pattern for hallucination control. Trust model A on silence, trust model B on speech.
- Speculative execution at the agent level is mandatory for latency-sensitive paths. Generate the answer before deciding whether you need it.
- DSL > free-form planning for auditable per-customer behavior. Sierra’s Journeys give the model room to reason within bounded steps.
- Materialize agent state into substrates the model already loves. File systems, Git, grep — let the base model’s existing capability do the work.
- The orchestrator is application code, not the model. Fan out, merge with explicit rules, treat the LLM as one component among many.
- Agentic commerce is the next monetization shape. Per-sale commissions rather than per-minute pricing.
Source
- Title: The best AI agents are simpler than you think
- Speakers: Zach Reneau-Wedeen (Head of Product, Sierra), Harrison Chase (LangChain)
- Venue: LangChain — Interrupt 26
- Duration: 1h 27m
- URL: https://www.youtube.com/watch?v=uCKhOmth2ms