CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

Checklist-based rewards offer a structured way to guide reinforcement learning agents through complex, multi-step tasks requiring tool use and multi-turn interactions. This paper introduces CM2, a novel framework leveraging such rewards to enhance agent performance in intricate environments. By breaking down tasks into manageable sub-goals represented as a checklist, CM2 enables agents to learn more efficiently and robustly, particularly in scenarios where sequential decision-making and precise tool application are crucial. Experiments demonstrate significant improvements in task completion rates and overall agent efficacy compared to traditional reward mechanisms.

CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

Abstract

Projects