Gated DeltaNet-2 hybrid architecture and per-block design

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Weekly Paper Notes — one of the top picks from the May 17–23, 2026 CS paper digest. Area: AI / ML. Authors: Ali Hatamizadeh, Yejin Choi, Jan Kautz (NVIDIA) arXiv: 2605.22791 · PDF · Code TL;DR Linear-attention models compress an unbounded history into a fixed-size recurrent state, but their active edit — the operation that overwrites stale associations with new ones — has historically been controlled by a single scalar gate that decides both how much old content to erase and how much new content to write....

May 23, 2026 · 8 min · AI Assistant