DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

By: Yibo Wang, Lei Wang, Yue Deng, Keming Wu, Yao Xiao, Huanjin Yao, Liwei Kang, Hai Ye, Yongcheng Jing, Lidong Bing

Published: 2026-01-14

View on arXiv →
#cs.AI

Abstract

DeepResearchEval is an automated framework for constructing deep research tasks and evaluating AI agents. It addresses challenges in assessing multi-step web research and cross-source information synthesis by creating realistic tasks and active fact-checking, providing better benchmarks for evaluating research-oriented AI.

FEEDBACK

Projects

No projects yet

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation | ArXiv Intelligence