中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Image captioning: Semantic selection unit with stacked residual attention

文献类型:期刊论文

作者Song, Lifei1,2; Li, Fei3; Wang, Ying1,2; Liu, Yu4; Wang, Yuanhua4; Xiang, Shiming1,2
刊名IMAGE AND VISION COMPUTING
出版日期2024-04-01
卷号144页码:12
关键词Image captioning Semantic attributes Semantic selection unit Transformer Stacked residual attention
ISSN号0262-8856
DOI10.1016/j.imavis.2024.104965
通讯作者Wang, Ying(ying.wang@ia.ac.cn)
英文摘要Semantic information and attention mechanism play important roles in the task of image captioning. Semantic information can strengthen the relationship between images and languages, while attention operation can steer the relevant regions spatially in the image. However, in most current works, semantic attributes are always confined to be learned from pairs of images and sentences, which ignore to fully utilize more semantic attributes and the structure information of sentences, thus limit the variety of sentences to be generated. Meanwhile, current attention models usually lack the ability to learn the positional information in an explicit way during attention generation, and have the problem of vanishing gradient in the training process. This paper proposes a Semantic Selection Unit (SSU) and a Stacked Residual Attention (SRA) to remedy these drawbacks. Specifically, the SSU is designed to capture selectively semantic information from expanding attributes or guidance sentences. With the help of expanding vocabulary and the structure information in sentences, the SSU can improve the quality of the generated sentences. The SRA is constructed to solve the problem of positional information missing and vanishing gradient problem during attention generation. Architecturally, the SSU and SRA work together in a jointed framework with end -to -end learning for image captioning. Extensive experiments have been conducted on the public dataset of the MS COCO, achieving 139.7 CIDEr score on the test set.
WOS关键词TRANSFORMER
资助项目National Key Research and Development Program of China[2018AAA0100400] ; National Natural Science Foundation of China[62076242]
WOS研究方向Computer Science ; Engineering ; Optics
语种英语
WOS记录号WOS:001202109600001
出版者ELSEVIER
资助机构National Key Research and Development Program of China ; National Natural Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/58150]  
专题自动化研究所_模式识别国家重点实验室_遥感图像处理团队
通讯作者Wang, Ying
作者单位1.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
3.China Tower Corp Ltd, Beijing 100029, Peoples R China
4.Beijing Inst Tracking & Telecommun Technol, Beijing 100094, Peoples R China
推荐引用方式
GB/T 7714
Song, Lifei,Li, Fei,Wang, Ying,et al. Image captioning: Semantic selection unit with stacked residual attention[J]. IMAGE AND VISION COMPUTING,2024,144:12.
APA Song, Lifei,Li, Fei,Wang, Ying,Liu, Yu,Wang, Yuanhua,&Xiang, Shiming.(2024).Image captioning: Semantic selection unit with stacked residual attention.IMAGE AND VISION COMPUTING,144,12.
MLA Song, Lifei,et al."Image captioning: Semantic selection unit with stacked residual attention".IMAGE AND VISION COMPUTING 144(2024):12.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。