Weekly Paper Notes — one of the top picks from the May 24–30, 2026 CS paper digest. Area: NLP / Theory.
Authors: Jon Kleinberg, Anay Mehrotra, Amin Saberi (Cornell / Yale / Stanford) arXiv: 2605.30324 · PDF
TL;DR
A line of theoretical work asks: given examples from an unknown target language drawn from a known countable collection, can a learner eventually output only new valid strings from that language? Prior results — including Kleinberg & Mullainathan’s 2024 paper that triggered the modern wave — assume the learner remembers the entire example history. This paper drops that assumption and asks what happens when memory is bounded. Three findings:
- Memoryless generators can still handle every countable collection of infinite languages (under a mild enumeration restriction). Without that restriction, the authors give an exact characterization of which collections admit memoryless generation, and they pin down the optimal minimax density via Sperner’s theorem and symmetric chain decompositions — a tidy combinatorial connection.
- A sliding window of the last $W$ examples doesn’t help in the worst case. But a learner that may adaptively retain $b$ past examples (its choice which) achieves strictly better density for every $b \geq 1$. Adaptive memory > windowed memory.
- Identification in the limit (converge to a single hypothesis) is much fragiler: a learner with only its previous guess as memory fails on collections as small as three languages. Relax the convergence target to an “approximate” version of the language and finite collections become tractable again.
Why this matters
The Kleinberg–Mullainathan generation-in-the-limit framework has become a lens through which to formally study what LLMs can and cannot do as language generators (as opposed to recognizers). The unbounded-memory assumption is theoretically clean but operationally unrealistic — no real generator (and no realistic abstraction of one) keeps every training example in its context. This paper provides the natural sequel: which results survive memory pressure, and which don’t.
The headline message is striking. Generation is robust to memory bounds — even fully memoryless learners can cover every countable collection. Density and identification are not — they collapse to finite collections, and identification breaks at $n=3$. That asymmetry is itself a contribution: it gives a principled reason why generating valid new examples might be an easier task than committing to a single explanatory hypothesis, even when both look superficially similar.
Author signal
Jon Kleinberg is among the most cited authors in theoretical CS, and a co-author of the 2024 paper that started this thread. Pairing his framework with Saberi (Stanford algorithms / market design) and Mehrotra signals that this is a careful, technically dense extension rather than an incremental remix.
Read more
- arXiv abstract: https://arxiv.org/abs/2605.30324
- PDF: https://arxiv.org/pdf/2605.30324