ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Featured

Abstract

Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar. Preprint 2024.

We present ArCHer, a framework for building multi-turn RL algorithms for training LLM agents. It preserves the flexibility of existing single-turn RL methods for LLMs like PPO, while accommodating multiple turns, long horizons, and delayed rewards effectively.

Publication
Preprint, Under Review
Jiayi Pan
Jiayi Pan
潘家怡