AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

By: Alisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach

Published: 2026-02-09

View on arXiv →
#cs.AI

Abstract

This paper introduces AIRS-Bench, a comprehensive benchmark suite designed to evaluate the capabilities of frontier AI research science agents across various tasks. It provides a standardized framework for assessing and advancing the development of autonomous AI systems for scientific discovery.

FEEDBACK

Projects

No projects yet

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents | ArXiv Intelligence