MedPI: Evaluating AI Systems in Medical Patient-facing Interactions

By: Diego Fajardo V., Oleksii Proniakin, Victoria-Elisabeth Gruber, Razvan Marinescu

Published: 2026-01-08

View on arXiv →
#cs.AI

Abstract

This paper introduces MedPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn QA benchmarks, MedPI assesses medical dialogue across 105 dimensions, covering medical process, treatment safety, outcomes, and communication, essential for responsible AI in healthcare.

FEEDBACK

Projects

No projects yet