Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

By: Aaron Rose, Carissa Cullen, Brandon Gary Kaplowitz, Christian Schroeder de Witt

Published: 2026-04-02

View on arXiv →
#cs.AI

Abstract

This paper focuses on detecting collusion in multi-agent systems using multi-agent interpretability techniques. By understanding the decision-making processes of individual agents and their interactions, this research can help identify and prevent undesirable collaborative behaviors, which is critical for the safety, ethics, and reliability of complex AI systems in areas like autonomous driving, financial markets, and strategic simulations.

FEEDBACK

Projects

No projects yet

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability | ArXiv Intelligence