MedPI: Evaluating AI Systems in Medical Patient-facing Interactions
By: Diego Fajardo V., Oleksii Proniakin, Victoria-Elisabeth Gruber, Razvan Marinescu
Published: 2026-01-08
View on arXiv →#cs.AI
Abstract
This paper introduces MedPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn QA benchmarks, MedPI assesses medical dialogue across 105 dimensions, covering medical process, treatment safety, outcomes, and communication, essential for responsible AI in healthcare.