Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

By: Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Published: 2025-12-15

View on arXiv →
#cs.AI

Abstract

This paper proposes Nemotron-Cascade, a framework for developing general-purpose reasoning models using cascaded domain-wise reinforcement learning (Cascade RL). It addresses heterogeneity in RL infrastructure by orchestrating sequential, domain-wise RL, achieving state-of-the-art performance across competitive programming, math, and software engineering benchmarks, and enabling models to operate in both "instruct" and "deep thinking" modes.

FEEDBACK

Projects

No projects yet