Dynamo: Amazon's Highly Available Key-value Store (2007)
Weekly Paper Notes — Seminal Paper of the Week for the 2026-06-06 CS paper digest. Area: Distributed Systems / Databases. Citation: Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels — Dynamo: Amazon’s Highly Available Key-value Store. SOSP ‘07. DOI: 10.1145/1294261.1294281 Canonical PDF: Amazon Dynamo paper (Werner Vogels’ archive) Why the paper still matters Almost every popular “NoSQL” key-value store of the last fifteen years — Cassandra, Riak, Voldemort, DynamoDB (the service), early versions of Redis Cluster, parts of MongoDB’s replica routing — pulls its core design vocabulary directly from Dynamo: consistent hashing for partitioning, vector clocks for divergence tracking, sloppy quorums with hinted handoff for availability under failure, and read repair / Merkle-tree anti-entropy for eventual convergence....
I See What You Mean — Peter Alvaro (Strange Loop 2015)
Eleven years after delivery, “I See What You Mean” remains the single best talk on why distributed systems are hard as a language design problem, not as an engineering problem. Peter Alvaro — now a professor at UC Santa Cruz, then a Berkeley PhD finishing the BOOM project — walks through a decade of research on Dedalus and Bloom and ends with the CALM theorem: a precise, syntactic answer to the question “when does a distributed program need coordination, and when can we get away without it?...
Pretraining Recurrent Networks without Recurrence
Weekly Paper Notes — one of the top picks from the 2026-06-06 CS paper digest. Area: AI / ML. Authors: Akarsh Kumar, Phillip Isola (MIT) arXiv: 2606.06479 · PDF TL;DR This paper proposes Supervised Memory Training (SMT), a way to pretrain nonlinear RNNs without ever doing backpropagation through time (BPTT). The trick: replace recurrent credit assignment with a supervised problem over memory transitions. A Transformer-based “memory encoder” is first trained with a predictive-state objective — it learns a representation m_t that retains exactly the information about the past needed to predict the future....
SWE-rebench: Lessons from Evaluating Coding Agents
Vibes-based model selection is fine until your agent ships to production and starts billing customers for failed PRs. Ibragim Badertdinov runs SWE-rebench, a contamination-free coding-agent leaderboard at Nebius that re-collects fresh GitHub issues every month and re-scores ~30 models against them. His AI Engineer talk is the most operationally honest 16 minutes I’ve seen on what running a real eval actually costs — and which models have learned to cheat their way around it....
Text Diffusion — Brendon Dillon, Google DeepMind
For two years the LLM serving stack has been an autoregressive monoculture: one token at a time, KV cache, speculative decoding around the edges. Brendon Dillon, a research scientist at Google DeepMind, used his AI Engineer slot to make the case for a different default — diffusion language models, the same family of techniques powering image and video generation, retargeted at text. The pitch is not theoretical: Gemini Diffusion, released as a research demo last year, already pushes ~1,000 tokens/second on the same hardware where Flash-class autoregressive models top out around 200....
You Only Index Once: Cross-Layer Sparse Attention with Shared Routing
Weekly Paper Notes — one of the top picks from the 2026-06-06 CS paper digest. Area: NLP / Systems-for-ML. Authors: Yutao Sun, Yanqi Zhang, Li Dong, et al. (Microsoft Research Asia) arXiv: 2606.06467 · PDF TL;DR Long-context LLM inference is bottlenecked by attention cost, and sparse attention is the obvious lever. The two existing families both disappoint in practice: block-sparse patterns (sliding window, dilated, etc.) give clean speedups but lose quality, while token-sparse patterns (top-k over the KV cache) preserve quality but spend most of the budget deciding which tokens to attend to — the routing itself becomes the bottleneck....
Hermes in My Homelab
I have more questions than time to answer them. My backlog is full of things I genuinely want to explore: “Is a Crossplane Composition plus function-sequencer really equivalent to a hand-written Kubebuilder controller?”, “Can Temporal reliably drive a Deep-Agent loop?” and, more recently, “Can TigerData run on CNPG without a custom operator?” In theory, these just need a focused weekend. In reality, between work and chasing after my kids in the playground, my window for “sitting at a desk to explore” is maybe two evenings a week....
Attention Is All You Need (2017): The Architecture That Ate Machine Learning
Weekly Paper Notes — Seminal Paper of the Week for May 24–30, 2026. After a multi-week streak of systems classics (Raft, MapReduce, Lamport, ARIES), this week rotates to AI / ML. Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin (Google Brain / Google Research / University of Toronto) Venue: NeurIPS 2017 arXiv: 1706.03762 · PDF Why this paper Picking Attention Is All You Need as a Seminal Paper of the Week in 2026 feels almost too on-the-nose — the Transformer is the architectural substrate underneath every frontier LLM, every modern diffusion model, every state-of-the-art protein folding system, every reasoning model whose chain-of-thought you have ever read....
Classic of the Week: Rich Hickey — 'Simple Made Easy' (2011)
Weekly Video Notes — Classic of the Week. A foundational talk worth re-watching, paired with key frames and a short essay on why it still matters. Fifteen years after it was delivered, Rich Hickey’s “Simple Made Easy” remains the single best talk on software complexity ever recorded. The thesis is one sentence — simple and easy are different things, and conflating them is the root cause of most accidental complexity....
Devin's 80% Moment: Background Agents, 7× PRs, and the End of Hand-Held Coding
Weekly Video Notes — a short article distilling one talk from the weekly digest. Source video and key frames embedded throughout. In this 1h10m Latent Space conversation, Cognition CTO Walden Yan and engineer Cole Murray walk through what they’re calling Devin’s “80% moment” — the point at which an autonomous coding agent can land production-grade PRs on real codebases at a rate that changes how teams work. Cognition is reporting a 7× increase in merged PRs for teams that adopt their background-agent workflow, and the conversation digs into why hand-held in-IDE coding is no longer the frontier....