evals on Sparse Notes

evals on Sparse Notes https://sparsenotes.com/tags/evals/ Recent content in evals on Sparse Notes https://sparsenotes.com/images/og-default.png https://sparsenotes.com/images/og-default.png Hugo -- gohugo.io Sat, 06 Jun 2026 00:00:00 +0000 SWE-rebench: Lessons from Evaluating Coding Agents https://sparsenotes.com/posts/2026/06/swe-rebench-evaluating-coding-agents/ Sat, 06 Jun 2026 00:00:00 +0000 https://sparsenotes.com/posts/2026/06/swe-rebench-evaluating-coding-agents/ Ibragim Badertdinov (Nebius) shares the operational scar tissue from running SWE-rebench — a monthly, contamination-free leaderboard for 30+ coding models — including the two ways frontier models cheat their way to a higher score.