Agent psychometrics: Task-level performance prediction in agentic coding benchmarks

This paper explores agent psychometrics, focusing on predicting task-level performance in agentic coding benchmarks. It delves into methodologies for evaluating the capabilities of AI coding agents beyond simple pass/fail rates, aiming to understand their strengths, weaknesses, and potential for real-world software development. By developing metrics and predictive models for agent performance, the research contributes to building more reliable and efficient AI assistants for programmers, enhancing the overall productivity and quality of software engineering processes.

Agent psychometrics: Task-level performance prediction in agentic coding benchmarks

Abstract

Projects