中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

文献类型:期刊论文

作者Chenglong Wang1; Yang Gan1; Yifu Huo1; Yongyu Mu1; Murun Yang1; Qiaozhi He1; Tong Xiao1,2; Chunliang Zhang1,2; Tongran Liu3; Quan Du2
刊名arXiv
出版日期2024
页码14
通讯作者邮箱xiaotong@mail.neu.edu.cn
DOI10.48550/arXiv.2408.12109
文献子类综述
英文摘要

Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difffculty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a threephase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.

收录类别EI
源URL[http://ir.psych.ac.cn/handle/311026/48771]  
专题心理研究所_中国科学院行为科学重点实验室
作者单位1.School of Computer Science and Engineering, Northeastern University, Shenyang, China
2.NiuTrans Research, Shenyang, China
3.CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS, Beijing, China
推荐引用方式
GB/T 7714
Chenglong Wang,Yang Gan,Yifu Huo,et al. RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data[J]. arXiv,2024:14.
APA Chenglong Wang.,Yang Gan.,Yifu Huo.,Yongyu Mu.,Murun Yang.,...&Jingbo Zhu.(2024).RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data.arXiv,14.
MLA Chenglong Wang,et al."RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data".arXiv (2024):14.

入库方式: OAI收割

来源:心理研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。