A Pragmatic VLA Foundation Model

LingBot-VLA is a Vision-Language-Action foundation model pre-trained on 20,000 hours of real-world multi-embodiment robot data. It demonstrates that VLA model performance scales with increasing data volume without saturation, achieving superior success rates on a 100-task real-world benchmark across three robot platforms, and improving training efficiency. This directly advances practical robotics.

A Pragmatic VLA Foundation Model

Abstract

Projects