SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

Spatial transcriptomics experiments are rapidly expanding in scale and complexity, making computational analysis a major bottleneck in biological discovery. While frontier AI agents have shown significant improvements in software engineering and general data analysis, it remains unclear whether they can extract biological insights from messy, real-world spatial data. We introduce SpatialBench, a benchmark of 146 verifiable problems derived from practical spatial analysis workflows spanning five spatial technologies and seven task categories. Benchmark data on frontier models shows that base model accuracy remains low (20-38% across model families), with strong model-task and model-platform interactions. SpatialBench serves both as a measurement tool and a diagnostic lens for developing agents that can interact with real spatial datasets faithfully, transparently, and reproducibly.

SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

Abstract

Projects