Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
By: Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping
Published: 2025-12-15
View on arXiv →#cs.AI
Abstract
This paper proposes Nemotron-Cascade, a framework for developing general-purpose reasoning models using cascaded domain-wise reinforcement learning (Cascade RL). It addresses heterogeneity in RL infrastructure by orchestrating sequential, domain-wise RL, achieving state-of-the-art performance across competitive programming, math, and software engineering benchmarks, and enabling models to operate in both "instruct" and "deep thinking" modes.