Image captioning via hierarchical attention mechanism and policy gradient optimization
文献类型:期刊论文
作者 | Yan, Shiyang2; Xie, Yuan1,3,4,6; Wu, Fangyu5,7; Smith, Jeremy S.5; Lu, Wenjin7; Zhang, Bailing3,4 |
刊名 | SIGNAL PROCESSING |
出版日期 | 2020-02-01 |
卷号 | 167页码:12 |
ISSN号 | 0165-1684 |
关键词 | Image captioning Hierarchical attention mechanism Generative adversarial network Reinforcement learning Policy gradient |
DOI | 10.1016/j.sigpro.2019.107329 |
通讯作者 | Yan, Shiyang(shiyang.yan@qub.ac.uk) |
英文摘要 | Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated caption and the image content by the discriminator in the GAN framework and RL optimization, we make the finally generated sentences more accurate and natural. Comprehensive experiments show the improved performance of the hierarchical attention mechanism and the effectiveness of our RL-based optimization method. Our model achieves state-of-the-art results on several important metrics in the MSCOCO dataset, using only greedy inference. (C) 2019 Elsevier B.V. All rights reserved. |
WOS关键词 | NETWORKS |
WOS研究方向 | Engineering |
语种 | 英语 |
出版者 | ELSEVIER |
WOS记录号 | WOS:000497600200030 |
源URL | [http://ir.ia.ac.cn/handle/173211/29387] |
专题 | 自动化研究所_精密感知与控制研究中心 |
通讯作者 | Yan, Shiyang |
作者单位 | 1.East China Normal Univ, Sch Comp Sci & Software Engn, Shanghai, Peoples R China 2.Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland 3.Inst Adv Artificial Intelligence Nanjing, Nanjing, Jiangsu, Peoples R China 4.Horizon Robot, Beijing, Peoples R China 5.Univ Liverpool, Elect Engn & Elect, Liverpool, Merseyside, England 6.Chinese Acad Sci, Inst Automat, Beijing, Peoples R China 7.Xian Jiaotong Liverpool Univ, Dept Comp Sci & Software Engn, Suzhou, Peoples R China |
推荐引用方式 GB/T 7714 | Yan, Shiyang,Xie, Yuan,Wu, Fangyu,et al. Image captioning via hierarchical attention mechanism and policy gradient optimization[J]. SIGNAL PROCESSING,2020,167:12. |
APA | Yan, Shiyang,Xie, Yuan,Wu, Fangyu,Smith, Jeremy S.,Lu, Wenjin,&Zhang, Bailing.(2020).Image captioning via hierarchical attention mechanism and policy gradient optimization.SIGNAL PROCESSING,167,12. |
MLA | Yan, Shiyang,et al."Image captioning via hierarchical attention mechanism and policy gradient optimization".SIGNAL PROCESSING 167(2020):12. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。