RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

By: Boyang Wang, Haoran Zhang, Shujie Zhang, Jinkun Hao, Mingda Jia, Qi Lv, Yucheng Mao, Zhaoyang Lyu, Jia Zeng, Xudong Xu, Jiangmiao Pang

Published: 2026-01-08

View on arXiv →

#cs.AI

Abstract

RoboVIP introduces a multi-view video generation framework that enhances robotic manipulation datasets by creating diverse backgrounds and tabletop scenes using visual identity prompting. This method allows state-of-the-art robot policies to achieve improved task success rates and enhanced generalization in both simulated and real-world cluttered environments, advancing robot learning and adaptation.

FEEDBACK

Projects

No projects yet

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation | ArXiv Intelligence