Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI.

By: Michael O'Herlihy, Rosa Català

Published: 2026-04-22

View on arXiv →
#cs.AI

Abstract

This paper introduces Defensibility Signals to evaluate rule-governed AI systems, particularly in content moderation, formalizing policy-grounded correctness and offering methods like the Defensibility Index (DI) to assess reasoning stability.

FEEDBACK

Projects

No projects yet

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI. | ArXiv Intelligence