中国科学院机构知识库网格系统: Boosted Transformer for Image Captioning

Boosted Transformer for Image Captioning

文献类型：期刊论文


作者	Li, Jiangyun 1,2; Yao, Peng 1,2,4; Guo, Longteng3 ; Zhang, Weicun 1,2
刊名	APPLIED SCIENCES-BASEL
出版日期	2019-08-01
卷号	9 期号:16 页码:15
关键词	image captioning self-attention deep learning transformer
DOI	10.3390/app9163260
通讯作者	Zhang, Weicun(weicunzhang@ustb.edu.cn)
英文摘要	Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., Concept-Guided Attention (CGA) and Vision-Guided Attention (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.
资助项目	National Nature Science Foundation of China[61671054] ; Beijing Natural Science Foundation[4182038]
WOS研究方向	Chemistry ; Materials Science ; Physics
语种	英语
WOS记录号	WOS:000484444100054
出版者	MDPI
资助机构	National Nature Science Foundation of China ; Beijing Natural Science Foundation
源URL	[http://ir.ia.ac.cn/handle/173211/27241]
专题	自动化研究所_模式识别国家重点实验室_图像与视频分析团队
通讯作者	Zhang, Weicun
作者单位	1.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China 2.Minist Educ, Key Lab Knowledge Automat Ind Proc, Beijing 100083, Peoples R China 3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China 4.Univ Sci & Technol Beijing, Beijing 100083, Peoples R China
推荐引用方式 GB/T 7714	Li, Jiangyun,Yao, Peng,Guo, Longteng,et al. Boosted Transformer for Image Captioning[J]. APPLIED SCIENCES-BASEL,2019,9(16):15.
APA	Li, Jiangyun,Yao, Peng,Guo, Longteng,&Zhang, Weicun.(2019).Boosted Transformer for Image Captioning.APPLIED SCIENCES-BASEL,9(16),15.
MLA	Li, Jiangyun,et al."Boosted Transformer for Image Captioning".APPLIED SCIENCES-BASEL 9.16(2019):15.

入库方式： OAI收割

来源：自动化研究所

下载0

Boosted Transformer for Image Captioning

其他版本