Autonomous Evaluation and Refinement of Digital Agents
Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
April, 2024Abstract
Jiayi Pan, Yichi Zhang, Nickolas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr. Preprint 2024.
We design model-based evaluators to both evaluate and autonomously refine the performance of digital agents. We show that these open-ended evaluators can significantly improve agents' performance, through either fine-tuning or inference-time guidance, without any extra supervision.
Publication
Preprint, Under Review