DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
By: Aili Chen, Chi Zhang, Junteng Liu, Jiangjie Chen, Chengyu Du, Yunji Li, Ming Zhong, Qin Wang, Zhengmao Zhu, Jiayuan Song, Ke Ji, Junxian He, Pengyu Zhao, Yanghua Xiao
Published: 2026-03-10
View on arXiv →Abstract
Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. The paper proposes DIVE, an evidence-driven recipe that inverts synthesis order, executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces. This method significantly improves tool-use generalization and outperforms quantity scaling for out-of-distribution generalization, even with 4x less data.