VideoMaMa: Mask-Guided Video Matting via Generative Prior

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. We present VideoMaMa, a novel mask-guided video matting framework that converts coarse segmentation masks into pixel-accurate alpha mattes by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even when trained solely on synthetic data. This approach includes a scalable pseudo-labeling pipeline for large-scale video matting and the construction of the Matting Anything in Video (MA-V) dataset, suitable for professional video editing and content creation.

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Abstract

Projects