Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

By: Aris Hofmann, Inge Vejsbjerg, Jiatong Shi, Junwon Lee

Published: 2025-12-10

View on arXiv →
#cs.AI

Abstract

Auto-BenchmarkCard is a workflow designed to generate validated descriptions of AI benchmarks. It addresses the common issues of incomplete or inconsistent benchmark documentation by combining multi-agent data extraction from various sources (e.g., Hugging Face, Unitxt, academic papers) with LLM-driven synthesis. A validation phase ensures factual accuracy, promoting transparency, comparability, and reusability in AI benchmark reporting, which is crucial for researchers and practitioners in evaluating AI models.

FEEDBACK

Projects

No projects yet

Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation | ArXiv Intelligence