CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

By: Zhen Zhang, Kaiqiang Song, Xun Wang, Yebowen Hu, Weixiang Yan, Chenyang Zhao, Henry Peng Zou, Haoyun Deng, Sathish Reddy Indurthi, Shujian Liu, Simin Ma, Xiaoyang Wang, Xin Eric Wang, Song Wang

Published: 2026-02-13

View on arXiv →
#cs.AI

Abstract

Checklist-based rewards offer a structured way to guide reinforcement learning agents through complex, multi-step tasks requiring tool use and multi-turn interactions. This paper introduces CM2, a novel framework leveraging such rewards to enhance agent performance in intricate environments. By breaking down tasks into manageable sub-goals represented as a checklist, CM2 enables agents to learn more efficiently and robustly, particularly in scenarios where sequential decision-making and precise tool application are crucial. Experiments demonstrate significant improvements in task completion rates and overall agent efficacy compared to traditional reward mechanisms.

FEEDBACK

Projects

No projects yet

CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use | ArXiv Intelligence