MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

By: Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

Published: 2026-04-21

View on arXiv →
#cs.AI

Abstract

Mathematical problem solving remains a demanding test of reasoning for large language and multimodal models, yet existing benchmarks are small, monolingual, and limited in scope. This paper introduces MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and two decades of competitions, comprising 30,676 expert-authored problems with solutions across diverse domains. It supports tasks like problem solving, math-aware retrieval, and retrieval-augmented problem solving, revealing that state-of-the-art models are still challenged, especially in retrieving equivalent problems.

FEEDBACK

Projects

No projects yet

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval | ArXiv Intelligence