Autonomous Evaluation and Refinement of Digital Agents

Featured

Abstract

Jiayi Pan, Yichi Zhang, Nickolas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr. Preprint 2024.

We design model-based evaluators to both evaluate and autonomously refine the performance of digital agents. We show that these open-ended evaluators can significantly improve agents' performance, through either fine-tuning or inference-time guidance, without any extra supervision.

Publication
Preprint, Under Review
Jiayi Pan
Jiayi Pan
潘家怡