Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
By: Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe
Published: 2026-01-26
View on arXiv →#cs.AI
Abstract
This paper introduces SOAR, a new self-improvement framework that enables large language models (LLMs) to generate their own curricula for mathematical reasoning problems they cannot initially solve. It achieves substantial performance gains (e.g., an 8.5% pass@32 increase on fail@128-MATH) by grounding teacher rewards in measurable student progress rather than fragile intrinsic proxies. This framework suggests a pathway toward more autonomous AI systems that can identify and generate the intermediate steps necessary for tackling increasingly difficult problems without requiring extensive human-curated data.