Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
By: Aaron Rose, Carissa Cullen, Brandon Gary Kaplowitz, Christian Schroeder de Witt
Published: 2026-04-02
View on arXiv →#cs.AI
Abstract
This paper focuses on detecting collusion in multi-agent systems using multi-agent interpretability techniques. By understanding the decision-making processes of individual agents and their interactions, this research can help identify and prevent undesirable collaborative behaviors, which is critical for the safety, ethics, and reliability of complex AI systems in areas like autonomous driving, financial markets, and strategic simulations.