Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI.
By: Michael O'Herlihy, Rosa Català
Published: 2026-04-22
View on arXiv →#cs.AI
Abstract
This paper introduces Defensibility Signals to evaluate rule-governed AI systems, particularly in content moderation, formalizing policy-grounded correctness and offering methods like the Defensibility Index (DI) to assess reasoning stability.