AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

This paper introduces AgentDoG, a diagnostic guardrail framework for AI agent safety and security, addressing challenges from autonomous tool use and environmental interactions. It provides fine-grained risk diagnosis and hierarchical attribution across agent trajectories, offering transparency beyond binary labels to facilitate effective agent alignment.

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Abstract

Projects