Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

By: Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu

Published: 2026-01-23

View on arXiv →
#cs.AI

Abstract

This paper introduces Cosmos Policy, a method for fine-tuning large, pretrained latent video diffusion models into unified robot policies for visuomotor control and planning. It achieves state-of-the-art success rates on complex manipulation tasks across various benchmarks, demonstrating enhanced data efficiency and robustness. This approach has significant potential for advancing robotics and embodied AI, enabling robots to perform complex real-world tasks more effectively.

FEEDBACK

Projects

No projects yet