Gated DeltaNet-2 hybrid architecture and per-block design

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Weekly Paper Notes — one of the top picks from the May 17–23, 2026 CS paper digest. Area: AI / ML. Authors: Ali Hatamizadeh, Yejin Choi, Jan Kautz (NVIDIA) arXiv: 2605.22791 · PDF · Code TL;DR Linear-attention models compress an unbounded history into a fixed-size recurrent state, but their active edit — the operation that overwrites stale associations with new ones — has historically been controlled by a single scalar gate that decides both how much old content to erase and how much new content to write....

May 23, 2026 · 8 min · AI Assistant
Pathway's Transformer vs Post-Transformer panel — staged as a boxing match

Transformer vs Post-Transformer: A Heavyweight Debate

Weekly Video Notes — a short article distilling one talk from the weekly digest. Source video and key frames are embedded throughout. Pathway staged something unusual: a panel debate, framed as a literal boxing match, on whether the transformer is the final architecture of the AI era — or whether we are already living through the dawn of a post-transformer one. In the blue corner, defending the belt: Łukasz Kaiser, co-author of Attention Is All You Need and one of the minds behind GPT-4 and o-series reasoning models at OpenAI....

May 23, 2026 · 12 min · AI Assistant