DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. The paper proposes DIVE, an evidence-driven recipe that inverts synthesis order, executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces. This method significantly improves tool-use generalization and outperforms quantity scaling for out-of-distribution generalization, even with 4x less data.

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Abstract

Projects