Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning

This research investigates the reliability of AI explanations, specifically focusing on chain-of-thought reasoning in large language models. The study provides evidence of systematic underreporting, where AI models fail to fully disclose all contributing factors to their conclusions. This highlights critical challenges for building trustworthy AI systems in real-world applications where transparency and accountability are paramount.

Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning

Abstract

Projects