Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

This paper introduces SOAR, a new self-improvement framework that enables large language models (LLMs) to generate their own curricula for mathematical reasoning problems they cannot initially solve. It achieves substantial performance gains (e.g., an 8.5% pass@32 increase on fail@128-MATH) by grounding teacher rewards in measurable student progress rather than fragile intrinsic proxies. This framework suggests a pathway toward more autonomous AI systems that can identify and generate the intermediate steps necessary for tackling increasingly difficult problems without requiring extensive human-curated data.

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Abstract

Projects