LongVie 2: Multimodal Controllable Ultra-Long Video World Model

By: Jianxiong Gao, Zhaoxi Chen, Xian Liu, Junhao Zhuang, Chengming Xu, Jianfeng Feng, Yu Qiao, Yanwei Fu, Chenyang Si, Ziwei Liu

Published: 2025-12-16

View on arXiv →

#cs.AI

Abstract

This paper introduces LongVie 2, a multimodal controllable ultra-long video world model. It focuses on generating and understanding extended video sequences with high fidelity and controllability. This research has significant real-world applications in areas like video content creation, realistic simulation environments, and advanced human-computer interaction, pushing the boundaries of generative AI for video.

FEEDBACK

Projects

No projects yet

LongVie 2: Multimodal Controllable Ultra-Long Video World Model | ArXiv Intelligence