Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning



Yuexiang Zhai, Hao Bai*, Zipeng Lin*, Jiayi Pan*, Shengbang Tong*, Yifei Zhou*, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine. Preprint 2024.

We provide infrastructure and environment for training VLMs with RL on decision-making tasks. We show RL training enables our 7B model to outperform GPT-4V on these tasks. Additionally, we show the intriguing effectiveness of CoT reasoning for performance improvement

Preprint, Under Review
