weekly-videos-2026-06

Jeff Dean opening slide at Stanford's distinguished lecture series

Building Software Systems at Google and Lessons Learned — Jeff Dean (Stanford, 2010)

This is the talk every backend engineer eventually watches. Jeff Dean walks Stanford’s distinguished lecture audience through eleven years of evolution in Google’s search infrastructure — from a single-machine inverted index in 1999 to a planet-scale system serving thousands of queries per second with sub-second updates. The value of the lecture isn’t the specific numbers; it’s how he reasons about each rewrite as a response to one constraint becoming unbearable, and the design patterns that survived across seven major rewrites....

Sierra voice agent modular architecture diagram

Sierra's Voice Agent Architecture — Zach Reneau-Wedeen on Modular Multi-Model Pipelines

Sierra powers customer-experience voice agents for a large chunk of the Fortune 20, and in this Interrupt-26 conversation Zach Reneau-Wedeen (Head of Product) walks through what their production agent harness actually looks like. The headline: a voice agent in production does not look like the canonical “LLM-in-a-loop calling tools” diagram everyone draws on whiteboards. It looks like a multi-model ensemble pipeline with speculative execution baked in. “Coding agents are good at file systems — let’s materialize everything into a file system” The opening framing is a useful contrarian take: coding agents have a runaway lead on capability because they happen to operate on substrates — file systems, Git, grep — that the underlying models were already extremely good at....

Why AI Labs With Unlimited GPUs Still Fail — Anjney Midha on Culture, Mission, and Execution

Anjney Midha (AMP, formerly a16z, board member at several frontier labs) sits down with Latent Space for an hour on a question that wouldn’t have made sense in 2023: why are well-funded AI labs with all the compute they need failing to ship? His answer isn’t compute, it isn’t talent density, and it isn’t model architecture — it’s culture, mission alignment, and the boring details of execution. The diagnosis: culture, not capital Midha opens with the observation that has been circulating quietly inside frontier-lab boards for months: many of the best-funded labs of the 2024–2025 cohort have all the cash and all the compute they need and still can’t ship competitive models....

Kuba Rogut on stage opening 'RAG is dead, right?' at AI Engineer

RAG Is Dead, Right? Why Hybrid, Tool-Rich Retrieval Is the New Default for Agentic Search

Kuba Rogut, deployed engineer at Turbopuffer, gave one of the more refreshingly direct takes on the “RAG is dead” meme that’s been making the rounds on X. His argument, in one sentence: RAG isn’t dead — what’s dead is the strawman version where RAG means “embed everything once, run a single vector lookup, dump it into the LLM context.” The actual frontier is hybrid, tool-rich retrieval, where embeddings, BM25, grep, glob, regex, and filters are all tools an agent can compose iteratively....

Charlie from Pause AI describing his 64–128 parallel agents working on KV-cache compaction

Running 128 Coding Agents at Once: Inside Cursor, Pause AI, and the Era of Agent Maxing

A short on-camera conversation between Sam Whitmore (engineer on Cursor’s cloud-agents team) and Charlie + Harry of Pause AI, recorded at Baseten and posted by Cursor as part of its agent-era publicity push. The framing is intentionally provocative — “I’ve got 64 to 128 agents working on this at any given time” — but the substance is closer to a workshop chat between three practitioners who actually live inside agent harnesses all day....

Joe Armstrong showing Tom Kilburn's 1948 first-ever stored program

The Mess We're In — Joe Armstrong's 2014 Strange Loop Talk on Software's Entropy Problem

This week’s classic pick is Joe Armstrong’s 2014 Strange Loop talk The Mess We’re In — a 45-minute polemic from the co-creator of Erlang on why software is getting worse, what the laws of physics say about how fast computation could be, and how we should stop using human-chosen file names. Armstrong died in 2019, but the talk has aged remarkably well: in the era of 128-parallel coding agents, his entropy critique reads less like nostalgia and more like a warning we’ve kept ignoring....

I See What You Mean — Peter Alvaro at Strange Loop

I See What You Mean — Peter Alvaro (Strange Loop 2015)

Eleven years after delivery, “I See What You Mean” remains the single best talk on why distributed systems are hard as a language design problem, not as an engineering problem. Peter Alvaro — now a professor at UC Santa Cruz, then a Berkeley PhD finishing the BOOM project — walks through a decade of research on Dedalus and Bloom and ends with the CALM theorem: a precise, syntactic answer to the question “when does a distributed program need coordination, and when can we get away without it?...

SWE-rebench: Lessons from Evaluating Coding Agents

Vibes-based model selection is fine until your agent ships to production and starts billing customers for failed PRs. Ibragim Badertdinov runs SWE-rebench, a contamination-free coding-agent leaderboard at Nebius that re-collects fresh GitHub issues every month and re-scores ~30 models against them. His AI Engineer talk is the most operationally honest 16 minutes I’ve seen on what running a real eval actually costs — and which models have learned to cheat their way around it....

Text Diffusion — Brendon Dillon, Google DeepMind

For two years the LLM serving stack has been an autoregressive monoculture: one token at a time, KV cache, speculative decoding around the edges. Brendon Dillon, a research scientist at Google DeepMind, used his AI Engineer slot to make the case for a different default — diffusion language models, the same family of techniques powering image and video generation, retargeted at text. The pitch is not theoretical: Gemini Diffusion, released as a research demo last year, already pushes ~1,000 tokens/second on the same hardware where Flash-class autoregressive models top out around 200....