CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use
By: Zhen Zhang, Kaiqiang Song, Xun Wang, Yebowen Hu, Weixiang Yan, Chenyang Zhao, Henry Peng Zou, Haoyun Deng, Sathish Reddy Indurthi, Shujian Liu, Simin Ma, Xiaoyang Wang, Xin Eric Wang, Song Wang
Published: 2026-02-13
View on arXiv →Abstract
Checklist-based rewards offer a structured way to guide reinforcement learning agents through complex, multi-step tasks requiring tool use and multi-turn interactions. This paper introduces CM2, a novel framework leveraging such rewards to enhance agent performance in intricate environments. By breaking down tasks into manageable sub-goals represented as a checklist, CM2 enables agents to learn more efficiently and robustly, particularly in scenarios where sequential decision-making and precise tool application are crucial. Experiments demonstrate significant improvements in task completion rates and overall agent efficacy compared to traditional reward mechanisms.