RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
文献类型:期刊论文
作者 | Chenglong Wang1; Yang Gan1; Yifu Huo1; Yongyu Mu1; Murun Yang1; Qiaozhi He1; Tong Xiao1,2; Chunliang Zhang1,2; Tongran Liu3![]() |
刊名 | arXiv
![]() |
出版日期 | 2024 |
页码 | 14 |
通讯作者邮箱 | xiaotong@mail.neu.edu.cn |
DOI | 10.48550/arXiv.2408.12109 |
文献子类 | 综述 |
英文摘要 | Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difffculty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a threephase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization. |
收录类别 | EI |
源URL | [http://ir.psych.ac.cn/handle/311026/48771] ![]() |
专题 | 心理研究所_中国科学院行为科学重点实验室 |
作者单位 | 1.School of Computer Science and Engineering, Northeastern University, Shenyang, China 2.NiuTrans Research, Shenyang, China 3.CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS, Beijing, China |
推荐引用方式 GB/T 7714 | Chenglong Wang,Yang Gan,Yifu Huo,et al. RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data[J]. arXiv,2024:14. |
APA | Chenglong Wang.,Yang Gan.,Yifu Huo.,Yongyu Mu.,Murun Yang.,...&Jingbo Zhu.(2024).RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data.arXiv,14. |
MLA | Chenglong Wang,et al."RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data".arXiv (2024):14. |
入库方式: OAI收割
来源:心理研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。