Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
By: Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu
Published: 2026-01-23
View on arXiv →#cs.AI
Abstract
This paper introduces Cosmos Policy, a method for fine-tuning large, pretrained latent video diffusion models into unified robot policies for visuomotor control and planning. It achieves state-of-the-art success rates on complex manipulation tasks across various benchmarks, demonstrating enhanced data efficiency and robustness. This approach has significant potential for advancing robotics and embodied AI, enabling robots to perform complex real-world tasks more effectively.