Towards General-Purpose Embodied AI with Large Language Models

Embodied AI, which aims to develop intelligent agents capable of perceiving, acting, and reasoning in physical or simulated environments, represents a grand challenge in artificial intelligence. The emergence of Large Language Models (LLMs) with their powerful reasoning and planning capabilities has opened new avenues for achieving more general-purpose embodied intelligence. This paper explores the synergistic integration of LLMs with embodied AI systems. We discuss how LLMs can serve as high-level planners, interpreters of natural language instructions, and generators of executable code for robotic agents. We examine different architectures for combining LLMs with perception and control modules, ranging from direct prompt-based control to hierarchical planning frameworks. Through a review of recent advancements, we highlight the potential of LLMs to enable embodied agents to perform complex, multi-step tasks, adapt to novel situations, and engage in more intuitive human-robot interaction. We also address key challenges, including grounding language in physical reality, managing computational complexity, and ensuring safety and reliability in real-world deployments.

Towards General-Purpose Embodied AI with Large Language Models

Abstract

Projects