中国科学院机构知识库网格系统: Structure Preserving Convolutional Attention for Image Captioning

Structure Preserving Convolutional Attention for Image Captioning

文献类型：期刊论文


作者	Lu, Shichen 1,2,5; Hu, Ruimin 1,2; Liu, Jing3 ; Guo, Longteng3 ; Zheng, Fei 4
刊名	APPLIED SCIENCES-BASEL
出版日期	2019-07-02
卷号	9 期号:14 页码:10
关键词	image captioning attention spatial structure deep learning computer vision
DOI	10.3390/app9142888
通讯作者	Hu, Ruimin(hrm@whu.edu.cn)
英文摘要	In the task of image captioning, learning the attentive image regions is necessary to adaptively and precisely focus on the object semantics relevant to each decoded word. In this paper, we propose a convolutional attention module that can preserve the spatial structure of the image by performing the convolution operation directly on the 2D feature maps. The proposed attention mechanism contains two components: convolutional spatial attention and cross-channel attention, aiming to determine the intended regions to describe the image along the spatial and channel dimensions, respectively. Both of the two attentions are calculated at each decoding step. In order to preserve the spatial structure, instead of operating on the vector representation of each image grid, the two attention components are both computed directly on the entire feature maps with convolution operations. Experiments on two large-scale datasets (MSCOCO and Flickr30K) demonstrate the outstanding performance of our proposed method.
资助项目	National Nature Science Foundation of China[U1736206]
WOS研究方向	Chemistry ; Materials Science ; Physics
语种	英语
WOS记录号	WOS:000479026900115
出版者	MDPI
资助机构	National Nature Science Foundation of China
源URL	[http://ir.ia.ac.cn/handle/173211/27613]
专题	自动化研究所_模式识别国家重点实验室_图像与视频分析团队
通讯作者	Hu, Ruimin
作者单位	1.Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp, Wuhan 430072, Hubei, Peoples R China 2.Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Hubei, Peoples R China 3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China 4.China Gen Technol Res Inst, Beijing 100190, Peoples R China 5.Wuhan Univ, Informat Dept, Dormitory 8,Room 617, Wuhan 430072, Hubei, Peoples R China
推荐引用方式 GB/T 7714	Lu, Shichen,Hu, Ruimin,Liu, Jing,et al. Structure Preserving Convolutional Attention for Image Captioning[J]. APPLIED SCIENCES-BASEL,2019,9(14):10.
APA	Lu, Shichen,Hu, Ruimin,Liu, Jing,Guo, Longteng,&Zheng, Fei.(2019).Structure Preserving Convolutional Attention for Image Captioning.APPLIED SCIENCES-BASEL,9(14),10.
MLA	Lu, Shichen,et al."Structure Preserving Convolutional Attention for Image Captioning".APPLIED SCIENCES-BASEL 9.14(2019):10.

入库方式： OAI收割

来源：自动化研究所

下载0

Structure Preserving Convolutional Attention for Image Captioning

其他版本