Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
By: Zhiyuan Hu, Yunhai Hu, Juncheng Liu, Shuyue Stella Li, Yucheng Wang, Zhen Xu, See-Kiong Ng, Anh Tuan Luu, Xinxing Xu, Bryan Hooi, Cynthia Breazeal, Hae Won Park
Published: 2026-01-14
View on arXiv →Abstract
Multi-agent systems powered by Large Language Models (LLMs) often struggle with resource-intensive and unstable training due to non-stationarity and sparse rewards in multi-agent reinforcement learning (MARL). This paper presents Multi-Agent Test-Time Reinforcement Learning (MATTRL), a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team for discussions, retrieves and integrates test-time experiences, and reaches consensus. It significantly improves accuracy across benchmarks in medicine, math, and education, offering a stable and efficient path to robust multi-agent reasoning.