Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers
Weekly Paper Notes — one of the top picks from the 2026-06-13 CS paper digest. Area: Distributed Computing. Authors: Samuel Erickson, Mikael Johansson (KTH) arXiv: 2606.13287 · PDF TL;DR In asynchronous SGD (ASGD), workers compute gradients on possibly stale parameters and push updates without waiting for slow peers. That’s how you keep all the GPUs busy, but it’s also how slow workers (“stragglers”) inject large delays into the update stream, which classical analyses say should slow convergence in proportion to the maximum delay across the workers....