RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation
By: Boyang Wang, Haoran Zhang, Shujie Zhang, Jinkun Hao, Mingda Jia, Qi Lv, Yucheng Mao, Zhaoyang Lyu, Jia Zeng, Xudong Xu, Jiangmiao Pang
Published: 2026-01-08
View on arXiv →#cs.AI
Abstract
RoboVIP introduces a multi-view video generation framework that enhances robotic manipulation datasets by creating diverse backgrounds and tabletop scenes using visual identity prompting. This method allows state-of-the-art robot policies to achieve improved task success rates and enhanced generalization in both simulated and real-world cluttered environments, advancing robot learning and adaptation.