DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
By: Yibo Wang, Lei Wang, Yue Deng, Keming Wu, Yao Xiao, Huanjin Yao, Liwei Kang, Hai Ye, Yongcheng Jing, Lidong Bing
Published: 2026-01-14
View on arXiv →#cs.AI
Abstract
DeepResearchEval is an automated framework for constructing deep research tasks and evaluating AI agents. It addresses challenges in assessing multi-step web research and cross-source information synthesis by creating realistic tasks and active fact-checking, providing better benchmarks for evaluating research-oriented AI.