ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar
January, 2024Abstract
Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar. Preprint 2024.
We present ArCHer, a framework for building multi-turn RL algorithms for training LLM agents. It preserves the flexibility of existing single-turn RL methods for LLMs like PPO, while accommodating multiple turns, long horizons, and delayed rewards effectively.
Publication
Preprint, Under Review