ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Featured

Abstract

Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar. ICML 2024.

We present ArCHer, a new framework of multi-turn RL algorithms for training LM agents. It preserves the flexibility of mainstream single-turn LM RL methods like PPO, while effectively handling multiple turns, long horizons, and delayed rewards.

Publication
ICML 2024
Jiayi Pan
Jiayi Pan
潘家怡