DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations
By: Longtian Qiu, Shan Ning, Chuyu Zhang, Jiaxuan Sun, Xuming He
Published: 2026-01-26
View on arXiv →#cs.AI
Abstract
This work presents DA-DPO, a cost-efficient and difficulty-aware preference optimization method aimed at significantly reducing hallucinations in Multimodal Large Language Models (MLLMs). By optimizing based on content difficulty, the approach improves the factual consistency and reliability of MLLM outputs.