Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym

This paper introduces Spatial-Gym, a Gymnasium environment that isolates spatial constraint reasoning by testing pathfinding in 2D-grid puzzles as a sequential decision task with optional backtracking. It evaluates AI models in a step-by-step manner, revealing a significant human-model gap in spatial reasoning, and suggesting that current models struggle with global planning when forced into sequential actions. Spatial-Gym provides a framework for diagnosing limitations and improving spatial reasoning through reinforcement learning.

Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym

Abstract

Projects