llm on Sparse Notes

llm on Sparse Notes https://sparsenotes.com/tags/llm/ Recent content in llm on Sparse Notes https://sparsenotes.com/images/og-default.png https://sparsenotes.com/images/og-default.png Hugo -- gohugo.io Sat, 23 May 2026 00:00:00 +0000 Co-Scientist: DeepMind's Multi-Agent Engine for Novel Scientific Hypotheses https://sparsenotes.com/posts/2026/05/co-scientist-deepmind/ Sat, 23 May 2026 00:00:00 +0000 https://sparsenotes.com/posts/2026/05/co-scientist-deepmind/ DeepMind's six-minute overview of Co-Scientist — a multi-agent system that reads across the scientific literature, generates and ranks hypotheses, and gives working scientists an on-demand team of expert collaborators. Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention https://sparsenotes.com/posts/2026/05/papers/gated-deltanet-2/ Sat, 23 May 2026 00:00:00 +0000 https://sparsenotes.com/posts/2026/05/papers/gated-deltanet-2/ NVIDIA's Hatamizadeh, Choi, and Kautz introduce a linear-attention layer that splits the single scalar 'delta gate' into separate channel-wise erase and write gates — cleanly recovering KDA and Gated DeltaNet as tied subspaces, and beating both on long-context recall. Let's Build GPT From Scratch: Karpathy's Classic, Re-read in 2026 https://sparsenotes.com/posts/2026/05/karpathy-lets-build-gpt/ Sat, 23 May 2026 00:00:00 +0000 https://sparsenotes.com/posts/2026/05/karpathy-lets-build-gpt/ Notes on Andrej Karpathy's 2023 classic 'Let's build GPT: from scratch, in code, spelled out.' — tokenization, the bigram baseline, the math of self-attention, multi-head, residual + layernorm, and how the same ~200 lines scale to a real GPT. Transformer vs Post-Transformer: A Heavyweight Debate https://sparsenotes.com/posts/2026/05/transformer-vs-post-transformer/ Sat, 23 May 2026 00:00:00 +0000 https://sparsenotes.com/posts/2026/05/transformer-vs-post-transformer/ Notes from the Pathway-hosted 'Transformer vs Post-Transformer' panel — Łukasz Kaiser defending attention, with Adrian Kosowski (BDH/Pathway), Matthias Lechner (Liquid AI) and Llion Jones (Sakana AI) arguing for what comes next. Scaling laws, latent reasoning, hardware lock-in, benchmarks, and whether transformers themselves will discover their successor.