Enhancing Policy Learning with World-Action Model

This paper presents the World-Action Model (WAM), an action-regularized world model that jointly reasons over future visual observations and the actions that drive state transitions. WAM integrates an inverse dynamics objective into DreamerV2 to predict actions from latent state transitions, encouraging representations to capture action-relevant structure. It is evaluated on eight manipulation tasks from the CALVIN benchmark, showing improved behavioral cloning success and PPO fine-tuning performance with significantly fewer training steps compared to baselines.

Enhancing Policy Learning with World-Action Model

Abstract

Projects