Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation
By: Aris Hofmann, Inge Vejsbjerg, Jiatong Shi, Junwon Lee
Published: 2025-12-10
View on arXiv →#cs.AI
Abstract
Auto-BenchmarkCard is a workflow designed to generate validated descriptions of AI benchmarks. It addresses the common issues of incomplete or inconsistent benchmark documentation by combining multi-agent data extraction from various sources (e.g., Hugging Face, Unitxt, academic papers) with LLM-driven synthesis. A validation phase ensures factual accuracy, promoting transparency, comparability, and reusability in AI benchmark reporting, which is crucial for researchers and practitioners in evaluating AI models.