Pretraining Recurrent Networks without Recurrence
Weekly Paper Notes — one of the top picks from the 2026-06-06 CS paper digest. Area: AI / ML. Authors: Akarsh Kumar, Phillip Isola (MIT) arXiv: 2606.06479 · PDF TL;DR This paper proposes Supervised Memory Training (SMT), a way to pretrain nonlinear RNNs without ever doing backpropagation through time (BPTT). The trick: replace recurrent credit assignment with a supervised problem over memory transitions. A Transformer-based “memory encoder” is first trained with a predictive-state objective — it learns a representation m_t that retains exactly the information about the past needed to predict the future....