Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

By: Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe

Published: 2026-01-26

View on arXiv →
#cs.AI

Abstract

This paper introduces SOAR, a new self-improvement framework that enables large language models (LLMs) to generate their own curricula for mathematical reasoning problems they cannot initially solve. It achieves substantial performance gains (e.g., an 8.5% pass@32 increase on fail@128-MATH) by grounding teacher rewards in measurable student progress rather than fragile intrinsic proxies. This framework suggests a pathway toward more autonomous AI systems that can identify and generate the intermediate steps necessary for tackling increasingly difficult problems without requiring extensive human-curated data.

FEEDBACK

Projects

No projects yet

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability | ArXiv Intelligence