AgileOS: A GPU Operating System Layer for Protected CUDA Services
Weekly Paper Notes — one of the top picks from the 2026-06-13 CS paper digest. Area: Operating Systems / Systems. Authors: Zhuoping Yang, Yiyu Shi, Alex Jones arXiv: 2606.06697 · PDF TL;DR The GPU has quietly become a multi-tenant device — applications no longer just dispatch compute kernels, they call into vendor libraries (cuFFT, cuBLAS, NCCL), interact with GPU-resident services, and touch storage and network adapters through GPUDirect paths. But the CUDA programming model still hands each process the full keys to the device: its own context, raw device pointers, runtime handles, module loader, and direct kernel launch....
Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers
Weekly Paper Notes — one of the top picks from the 2026-06-13 CS paper digest. Area: Distributed Computing. Authors: Samuel Erickson, Mikael Johansson (KTH) arXiv: 2606.13287 · PDF TL;DR In asynchronous SGD (ASGD), workers compute gradients on possibly stale parameters and push updates without waiting for slow peers. That’s how you keep all the GPUs busy, but it’s also how slow workers (“stragglers”) inject large delays into the update stream, which classical analyses say should slow convergence in proportion to the maximum delay across the workers....
End-to-End Arguments in System Design (1984)
Seminal Paper of the Week — a foundational systems paper that quietly shapes how every distributed system you use is layered. Authors: Jerome H. Saltzer, David P. Reed, David D. Clark (MIT) Published: ACM Transactions on Computer Systems 2(4), November 1984. Canonical link: End-to-End Arguments in System Design (MIT) · ACM DOI 10.1145/357401.357402 TL;DR The end-to-end argument is a layering principle: a function should be implemented in a lower layer of a system only when it can be completely and correctly implemented at that layer, and when implementing it there provides a clear performance benefit over implementing it at the endpoints....
RAG Is Dead, Right? Why Hybrid, Tool-Rich Retrieval Is the New Default for Agentic Search
Kuba Rogut, deployed engineer at Turbopuffer, gave one of the more refreshingly direct takes on the “RAG is dead” meme that’s been making the rounds on X. His argument, in one sentence: RAG isn’t dead — what’s dead is the strawman version where RAG means “embed everything once, run a single vector lookup, dump it into the LLM context.” The actual frontier is hybrid, tool-rich retrieval, where embeddings, BM25, grep, glob, regex, and filters are all tools an agent can compose iteratively....
Running 128 Coding Agents at Once: Inside Cursor, Pause AI, and the Era of Agent Maxing
A short on-camera conversation between Sam Whitmore (engineer on Cursor’s cloud-agents team) and Charlie + Harry of Pause AI, recorded at Baseten and posted by Cursor as part of its agent-era publicity push. The framing is intentionally provocative — “I’ve got 64 to 128 agents working on this at any given time” — but the substance is closer to a workshop chat between three practitioners who actually live inside agent harnesses all day....
The Mess We're In — Joe Armstrong's 2014 Strange Loop Talk on Software's Entropy Problem
This week’s classic pick is Joe Armstrong’s 2014 Strange Loop talk The Mess We’re In — a 45-minute polemic from the co-creator of Erlang on why software is getting worse, what the laws of physics say about how fast computation could be, and how we should stop using human-chosen file names. Armstrong died in 2019, but the talk has aged remarkably well: in the era of 128-parallel coding agents, his entropy critique reads less like nostalgia and more like a warning we’ve kept ignoring....
Dynamo: Amazon's Highly Available Key-value Store (2007)
Weekly Paper Notes — Seminal Paper of the Week for the 2026-06-06 CS paper digest. Area: Distributed Systems / Databases. Citation: Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels — Dynamo: Amazon’s Highly Available Key-value Store. SOSP ‘07. DOI: 10.1145/1294261.1294281 Canonical PDF: Amazon Dynamo paper (Werner Vogels’ archive) Why the paper still matters Almost every popular “NoSQL” key-value store of the last fifteen years — Cassandra, Riak, Voldemort, DynamoDB (the service), early versions of Redis Cluster, parts of MongoDB’s replica routing — pulls its core design vocabulary directly from Dynamo: consistent hashing for partitioning, vector clocks for divergence tracking, sloppy quorums with hinted handoff for availability under failure, and read repair / Merkle-tree anti-entropy for eventual convergence....
I See What You Mean — Peter Alvaro (Strange Loop 2015)
Eleven years after delivery, “I See What You Mean” remains the single best talk on why distributed systems are hard as a language design problem, not as an engineering problem. Peter Alvaro — now a professor at UC Santa Cruz, then a Berkeley PhD finishing the BOOM project — walks through a decade of research on Dedalus and Bloom and ends with the CALM theorem: a precise, syntactic answer to the question “when does a distributed program need coordination, and when can we get away without it?...
Pretraining Recurrent Networks without Recurrence
Weekly Paper Notes — one of the top picks from the 2026-06-06 CS paper digest. Area: AI / ML. Authors: Akarsh Kumar, Phillip Isola (MIT) arXiv: 2606.06479 · PDF TL;DR This paper proposes Supervised Memory Training (SMT), a way to pretrain nonlinear RNNs without ever doing backpropagation through time (BPTT). The trick: replace recurrent credit assignment with a supervised problem over memory transitions. A Transformer-based “memory encoder” is first trained with a predictive-state objective — it learns a representation m_t that retains exactly the information about the past needed to predict the future....
SWE-rebench: Lessons from Evaluating Coding Agents
Vibes-based model selection is fine until your agent ships to production and starts billing customers for failed PRs. Ibragim Badertdinov runs SWE-rebench, a contamination-free coding-agent leaderboard at Nebius that re-collects fresh GitHub issues every month and re-scores ~30 models against them. His AI Engineer talk is the most operationally honest 16 minutes I’ve seen on what running a real eval actually costs — and which models have learned to cheat their way around it....