中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
avtmNet:Adaptive Visual-Text Merging Network for Image Captioning

文献类型:期刊论文

作者Song, Heng1,2,3; Zhu, Junwu1; Jiang, Yi1,4
刊名COMPUTERS & ELECTRICAL ENGINEERING
出版日期2020-06-01
卷号84页码:12
关键词Image captioning Computer Vision Natural Language Processing Attention Mechanism Neural networks
ISSN号0045-7906
DOI10.1016/j.compeleceng.2020.106630
英文摘要Recently, researchers have made extensive research on the technology of automatically generating descriptions for an image. Various technologies for image captioning have been proposed, among which attention-based encoder-decoder framework achieved great success. Two different types of attention models are proposed to generate image captions respectively, i.e., model based visual attention that is good at describing details, and model based text attention that is good at comprehensive understanding. In order to integrate and make full use of visual information and text information to generate more accurate captions for images, in this paper, we firstly introduce a visual attention model to generate the visual information and a text attention model to form the text information respectively, and then propose an adaptive visual-text merging network(avtmNet). This merging network can effectively merge the visual information and text information, and automatically determine the proportion of both visual information and text information to generate the next caption word. Extensive experiments are performed on the datasets named COCO2014 and Flickr30K respectively, and show the effectiveness and superiority of our proposed approach. (C) 2020 Elsevier Ltd. All rights reserved.
资助项目National Natural Science Foundation of China[61872313] ; Key Research Projects in Education Informatization in Jiangsu Province[20180012] ; Postgraduate Research and Practice Innovation Program of Jiangsu Province[KYCX18_2366] ; Yangzhou Science and Technology[YZ2018209] ; Yangzhou Science and Technology[YZ2019133] ; Yangzhou University Jiangdu HighEnd Equipment Engineering Technology Research Institute Open Project[YDJD201707] ; State Key Laboratory of Ocean Engineering, Shanghai Jiao Tong University[1907]
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:000579053300009
出版者PERGAMON-ELSEVIER SCIENCE LTD
源URL[http://119.78.100.204/handle/2XEOYT63/15735]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Jiang, Yi
作者单位1.Yangzhou Univ, Inst Informat Engn, Yangzhou, Jiangsu, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China
3.Univ Chinese Acad Sci, Beijing, Peoples R China
4.Shanghai Jiao Tong Univ, State Key Lab Ocean Engn, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Song, Heng,Zhu, Junwu,Jiang, Yi. avtmNet:Adaptive Visual-Text Merging Network for Image Captioning[J]. COMPUTERS & ELECTRICAL ENGINEERING,2020,84:12.
APA Song, Heng,Zhu, Junwu,&Jiang, Yi.(2020).avtmNet:Adaptive Visual-Text Merging Network for Image Captioning.COMPUTERS & ELECTRICAL ENGINEERING,84,12.
MLA Song, Heng,et al."avtmNet:Adaptive Visual-Text Merging Network for Image Captioning".COMPUTERS & ELECTRICAL ENGINEERING 84(2020):12.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。