Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
By: Yixin Liu, Yue Yu, DiJia Su, Sid Wang, Xuewei Wang, Song Jiang, Bo Liu, Arman Cohan, Yuandong Tian, Zhengxing Chen
Published: 2026-03-13
View on arXiv →#cs.AI
Abstract
This research investigates using reasoning Large Language Models (LLMs) as judges for evaluating other LLMs during post-training in non-verifiable domains, exploring their effectiveness, practical impact, and potential pitfalls in complex, subjective tasks.