AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
By: Shanghai AI Lab
Published: 2026-01-26
View on arXiv →#cs.AI
Abstract
This paper introduces AgentDoG, a diagnostic guardrail framework for AI agent safety and security, addressing challenges from autonomous tool use and environmental interactions. It provides fine-grained risk diagnosis and hierarchical attribution across agent trajectories, offering transparency beyond binary labels to facilitate effective agent alignment.