Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
文献类型:期刊论文
作者 | Cao, Pengfei1,6,7; Yang, Zhongyi1; Sun, Liang2; Liang, Yanchun1,3; Yang, Mary Qu4,5; Guan, Renchu1,3,4,5; Pengfei Cao![]() |
刊名 | NEURAL PROCESSING LETTERS
![]() |
出版日期 | 2019-08-01 |
卷号 | 50期号:1页码:103-119 |
关键词 | Image captioning Semantic attention mechanism Convolution neural network Bidirectional guiding LSTM |
ISSN号 | 1370-4621 |
DOI | 10.1007/s11063-018-09973-5 |
通讯作者 | Guan, Renchu(guanrenchu@jlu.edu.cn) |
英文摘要 | Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning. The proposed model consciously refines image features from previously generated text. By fine-tuning the parameters of convolution neural networks, Bag-LSTM obtains more text-related image features via feedback propagation than other models. As opposed to existing guidance-LSTM methods which directly add image features into each unit of an LSTM block, our fine-tuned model dynamically leverages more text-conditional image features, acquired by the semantic attention mechanism, as guidance information. Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and future contextual information. In addition, variations of the Bag-LSTM model are proposed in an effort to sufficiently describe high-level visual-language interactions. Experiments on the Flickr8k and MSCOCO benchmark datasets demonstrate the effectiveness of the model, as compared with the baseline algorithms, such as it is 51.2% higher than BRNN on CIDEr metric. |
资助项目 | National Natural Science Foundation of China[61572228] ; National Natural Science Foundation of China[61472158] ; National Natural Science Foundation of China[61300147] ; National Natural Science Foundation of China[61602207] ; National Natural Science Foundation of China[61402076] ; United States National Institutes of Health (NIH) Academic Research Enhancement Award[1R15GM114739] ; Science Technology Development Project from Jilin Province[20160101247JC] ; Zhuhai Premier-Discipline Enhancement Scheme and Guangdong Premier Key-Discipline Enhancement Scheme |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000479247900006 |
出版者 | SPRINGER |
资助机构 | National Natural Science Foundation of China ; United States National Institutes of Health (NIH) Academic Research Enhancement Award ; Science Technology Development Project from Jilin Province ; Zhuhai Premier-Discipline Enhancement Scheme and Guangdong Premier Key-Discipline Enhancement Scheme |
源URL | [http://ir.ia.ac.cn/handle/173211/27597] ![]() |
专题 | 模式识别国家重点实验室_自然语言处理 |
通讯作者 | Guan, Renchu |
作者单位 | 1.Jilin Univ, Coll Comp Sci & Technol, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China 2.Dalian Univ Technol, Coll Comp Sci & Technol, Dalian 116024, Peoples R China 3.Jilin Univ, Zhuhai Coll, Minist Educ, Zhuhai Lab,Key Lab Symbol Computat & Knowledge En, Zhuhai 519041, Peoples R China 4.Univ Arkansas, MidSouth Bioinformat Ctr, Little Rock, AR 72204 USA 5.Univ Arkansas Little Rock & Univ Arkansas Med Sci, Joint Bioinformat PhD Program, Little Rock, AR 72204 USA 6.Univ Chinese Acad Sci, Beijing 100049, Peoples R China 7.Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Cao, Pengfei,Yang, Zhongyi,Sun, Liang,et al. Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory[J]. NEURAL PROCESSING LETTERS,2019,50(1):103-119. |
APA | Cao, Pengfei.,Yang, Zhongyi.,Sun, Liang.,Liang, Yanchun.,Yang, Mary Qu.,...&Pengfei Cao.(2019).Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory.NEURAL PROCESSING LETTERS,50(1),103-119. |
MLA | Cao, Pengfei,et al."Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory".NEURAL PROCESSING LETTERS 50.1(2019):103-119. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。