DeepSeek-OCR 2: Visual Causal Flow

DeepSeek-OCR 2 introduces DeepEncoder V2, a cutting-edge vision-language model that significantly advances optical character recognition (OCR) capabilities. This model features a novel 'visual causal flow' mechanism, which dynamically reorders visual tokens based on their semantic relevance. This innovative approach enables more human-like causal reasoning in 2D image understanding through cascaded 1D causal structures, leading to substantial improvements in OCR accuracy. The model achieves an impressive 91.09% overall performance on OmniDocBench v1.5, a 3.73% improvement over its predecessor, and significantly reduces the reading order Edit Distance, making it highly effective for document processing and other real-world applications.

DeepSeek-OCR 2: Visual Causal Flow

Abstract

Projects