Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation
文献类型:期刊论文
作者 | Li, Qianzhong1,2![]() ![]() ![]() ![]() ![]() ![]() |
刊名 | Neurocomputing
![]() |
出版日期 | 2022-01-07 |
卷号 | 467期号:/页码:99-114 |
关键词 | Referring expression comprehension Referring expression segmentation Cross-modality synergy Attention mechanism |
ISSN号 | 0925-2312 |
DOI | 10.1016/j.neucom.2021.09.066 |
英文摘要 | Referring expression comprehension and segmentation aim to locate and segment a referred instance in an image according to a natural language expression. However, existing methods tend to ignore the interaction between visual and language modalities for visual feature learning, and establishing a synergy between the visual and language modalities remains a considerable challenge. To tackle the above problems, we propose a novel end-to-end framework, Cross-Modality Synergy Network (CMS-Net), to address the two tasks jointly. In this work, we propose an attention-aware representation learning module to learn modal representations for both images and expressions. A language self-attention submodule is proposed in this module to learn expression representations by leveraging the intra-modality relations, and a language-guided channel-spatial attention submodule is introduced to obtain the language aware visual representations under language guidance, which helps the model pay more attention to the referent-relevant regions in the images and relieve background interference. Then, we design a cross-modality synergy module to establish the inter-modality relations for modality fusion. Specifically, a language-visual similarity is obtained at each position of the visual feature map, and the synergy is achieved between the two modalities in both semantic and spatial dimensions. Furthermore, we propose a multi-scale feature fusion module with a selective strategy to aggregate the important information from multi-scale features, yielding target results. We conduct extensive experiments on four challenging benchmarks, and our framework achieves significant performance gains over state-of-the-art methods. (c) 2021 Elsevier B.V. All rights reserved. |
资助项目 | National Key Research and Development Project of China[2019YFB1310601] ; National Key R&D Program of China[2017YFC0820203-03] ; National Natural Science Foundation of China[62103410] |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000710121100009 |
出版者 | ELSEVIER |
资助机构 | National Key Research and Development Project of China ; National Key R&D Program of China ; National Natural Science Foundation of China |
源URL | [http://ir.ia.ac.cn/handle/173211/46309] ![]() |
专题 | 自动化研究所_复杂系统管理与控制国家重点实验室_先进机器人控制团队 |
通讯作者 | Zhang, Yujia |
作者单位 | 1.Institute of Automation, Chinese Academy of Sciences 2.Universigy of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Li, Qianzhong,Zhang, Yujia,Sun, Shiying,et al. Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation[J]. Neurocomputing,2022,467(/):99-114. |
APA | Li, Qianzhong,Zhang, Yujia,Sun, Shiying,Wu, Jinting,Zhao, Xiaoguang,&Tan, Min.(2022).Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation.Neurocomputing,467(/),99-114. |
MLA | Li, Qianzhong,et al."Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation".Neurocomputing 467./(2022):99-114. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。