VideoMaMa: Mask-Guided Video Matting via Generative Prior
By: Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee
Published: 2026-01-20
View on arXiv →Abstract
Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. We present VideoMaMa, a novel mask-guided video matting framework that converts coarse segmentation masks into pixel-accurate alpha mattes by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even when trained solely on synthetic data. This approach includes a scalable pseudo-labeling pipeline for large-scale video matting and the construction of the Matting Anything in Video (MA-V) dataset, suitable for professional video editing and content creation.