ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

This paper introduces ClawEnvKit, an autonomous pipeline for generating diverse and verified environments for training and evaluating claw-like robotic agents from natural language descriptions. This toolkit streamlines the creation of large-scale benchmarks, addressing the scalability issues of manual environment construction. It comprises a parser, generator, and validator to ensure feasibility, diversity, and consistency of generated environments. The resulting Auto-ClawEval benchmark demonstrates significant cost reduction and improved evaluation scale, showing that harness engineering boosts performance and highlighting the need for continuous evaluation.

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

Abstract

Projects