weekly-papers-2026-06-06

Dynamo: Amazon's Highly Available Key-value Store (2007)

Weekly Paper Notes — Seminal Paper of the Week for the 2026-06-06 CS paper digest. Area: Distributed Systems / Databases. Citation: Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels — Dynamo: Amazon’s Highly Available Key-value Store. SOSP ‘07. DOI: 10.1145/1294261.1294281 Canonical PDF: Amazon Dynamo paper (Werner Vogels’ archive) Why the paper still matters Almost every popular “NoSQL” key-value store of the last fifteen years — Cassandra, Riak, Voldemort, DynamoDB (the service), early versions of Redis Cluster, parts of MongoDB’s replica routing — pulls its core design vocabulary directly from Dynamo: consistent hashing for partitioning, vector clocks for divergence tracking, sloppy quorums with hinted handoff for availability under failure, and read repair / Merkle-tree anti-entropy for eventual convergence....

Pretraining Recurrent Networks without Recurrence

Weekly Paper Notes — one of the top picks from the 2026-06-06 CS paper digest. Area: AI / ML. Authors: Akarsh Kumar, Phillip Isola (MIT) arXiv: 2606.06479 · PDF TL;DR This paper proposes Supervised Memory Training (SMT), a way to pretrain nonlinear RNNs without ever doing backpropagation through time (BPTT). The trick: replace recurrent credit assignment with a supervised problem over memory transitions. A Transformer-based “memory encoder” is first trained with a predictive-state objective — it learns a representation m_t that retains exactly the information about the past needed to predict the future....

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Weekly Paper Notes — one of the top picks from the 2026-06-06 CS paper digest. Area: NLP / Systems-for-ML. Authors: Yutao Sun, Yanqi Zhang, Li Dong, et al. (Microsoft Research Asia) arXiv: 2606.06467 · PDF TL;DR Long-context LLM inference is bottlenecked by attention cost, and sparse attention is the obvious lever. The two existing families both disappoint in practice: block-sparse patterns (sliding window, dilated, etc.) give clean speedups but lose quality, while token-sparse patterns (top-k over the KV cache) preserve quality but spend most of the budget deciding which tokens to attend to — the routing itself becomes the bottleneck....