Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity.

By: Erez Yosef, Oron Anschel, Shunit Haviv Hakimi, Asaf Gendler, Adam Botach, Nimrod Berman, Igor Kviatkovsky

Published: 2026-04-27

View on arXiv →
#cs.AI

Abstract

This research proposes a novel framework for evaluating mathematical reasoning in large language models (LLMs) that moves beyond rigid symbolic checks, introducing an LLM-as-a-judge paradigm for robust assessment.

FEEDBACK

Projects

No projects yet

Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity. | ArXiv Intelligence