中国科学院机构知识库网格系统: RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

文献类型：期刊论文


作者	Chenglong Wang 1; Yang Gan 1; Yifu Huo 1; Yongyu Mu 1; Murun Yang 1; Qiaozhi He 1; Tong Xiao 1,2; Chunliang Zhang 1,2; Tongran Liu3 ; Quan Du 2
刊名	arXiv
出版日期	2024
页码	14
通讯作者邮箱	xiaotong@mail.neu.edu.cn
DOI	10.48550/arXiv.2408.12109
文献子类	综述
英文摘要	Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difffculty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a threephase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.
收录类别	EI
源URL	[http://ir.psych.ac.cn/handle/311026/48771]
专题	心理研究所_中国科学院行为科学重点实验室
作者单位	1.School of Computer Science and Engineering, Northeastern University, Shenyang, China 2.NiuTrans Research, Shenyang, China 3.CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS, Beijing, China
推荐引用方式 GB/T 7714	Chenglong Wang,Yang Gan,Yifu Huo,et al. RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data[J]. arXiv,2024:14.
APA	Chenglong Wang.,Yang Gan.,Yifu Huo.,Yongyu Mu.,Murun Yang.,...&Jingbo Zhu.(2024).RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data.arXiv,14.
MLA	Chenglong Wang,et al."RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data".arXiv (2024):14.

入库方式： OAI收割

来源：心理研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。