QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

By: Li Puyin, Tiange Xiang, Ella Mao, Shirley Wei, Xinye Chen, Adnan Masood, Li Fei-fei, Ehsan Adeli

Published: 2025-12-23

View on arXiv →
#cs.AI

Abstract

Vision-Language Models (VLMs) have shown remarkable progress, but their ability to reason about the physical world, crucial for real-world applications like robotics, remains underexplored. This paper introduces QuantiPhy, a quantitative benchmark designed to evaluate the physical reasoning capabilities of VLMs. QuantiPhy assesses how well VLMs understand and predict outcomes of physical interactions, such as object stability, movement, and collision, based on visual input. The benchmark provides a standardized method to measure progress in this critical area, pushing towards more robust and intelligent embodied AI systems that can operate effectively in complex physical environments.

FEEDBACK

Projects

No projects yet

QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models | ArXiv Intelligence