SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

By: Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, Shuyi Wang, Qunhong Zeng, Di Wang, Xuandong Zhao, Yuanli Wang, Roey Ben Chaim, Zonglin Di, Yipeng Gao, Junwei He, Yizhuo He, Liqiang Jing, Luyang Kong, Xin Lan, Jiachen Li, Songlin Li, Yijiang Li, Yueqian Lin, Xinyi Liu, Xuanqing Liu, Haoran Lyu, Ze Ma, Bowei Wang, Runhui Wang, Tianyu Wang, Wengao Ye, Yue Zhang

Published: 2026-02-13

View on arXiv →
#cs.AI

Abstract

SkillsBench presents the first benchmark designed to systematically evaluate the effectiveness of 'Agent Skills,' which are structured procedural knowledge packages intended to augment large language model agents. The research provides crucial insights into how well these skills perform across diverse tasks, offering valuable guidance for developing more capable and reliable AI agents in real-world scenarios.

FEEDBACK

Projects

No projects yet

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks | ArXiv Intelligence