Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

By: Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille

Published: 2025-12-19

View on arXiv →
#cs.AI

Abstract

Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superficially plausible but invalid steps. We introduce Generative Adversarial Reasoner (GAR), an on-policy joint training framework designed to enhance reasoning by co-evolving an LLM reasoner and an LLM-based discriminator through adversarial reinforcement learning. This produces dense, well-calibrated, on-policy step-level rewards that supplement sparse exact-match signals, improving credit assignment, increasing sample efficiency, and enhancing overall reasoning quality of LLMs.

FEEDBACK

Projects

No projects yet