中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Medical visual question answering with symmetric interaction attention and cross-modal gating

文献类型:期刊论文

作者Chen, Zhi3; Zou, Beiji3; Dai, Yulan3; Zhu, Chengzhang3; Kong, Guilan2; Zhang, Wensheng1
刊名BIOMEDICAL SIGNAL PROCESSING AND CONTROL
出版日期2023-08-01
卷号85页码:10
ISSN号1746-8094
关键词Medical visual question answering Self-attention Information interaction Cross-modal gating
DOI10.1016/j.bspc.2023.105049
通讯作者Zhu, Chengzhang(anandawork@126.com)
英文摘要The purpose of medical visual question answering (Med-VQA) is to provide accurate answers to clinical questions related to visual content of medical images. However, previous attempts neglect to take full advantage of the information interaction between medical images and clinical questions, which hinders the further progress of Med-VQA. The above issue requires the efforts to focus on critical information interaction within each modality and relevant information interaction between modalities. In this paper, we utilize the multiple meta-model quantifying model as visual encoder and the GloVe word embedding followed by the LSTM as textual encoder to form our feature extraction module. Then, we design a symmetric interaction attention module to construct dense and deep intra-and inter-modal information interaction on medical images and clinical questions for the Med-VQA task. Specifically, the symmetric interaction attention module consists of multiple symmetric interaction attention blocks that contain two basic units, i.e., self-attention and interaction attention. Technically, self-attention is introduced for intra-modal information interaction, while interaction attention is constructed for inter-modal information interaction. In addition, we develop a multi-modal fusion scheme that leverages the cross-modal gating to effectively fuse multi-modal information and avoid redundant information after sufficient intra-and inter-modal information interaction. Experimental results on the VQA-RAD dataset and PathVQA dataset show that our method outperforms other state-of-the-art Med-VQA models, achieving 74.7% and 48.7% on accuracy, 73.5% and 46.0% on F1-score, respectively.
资助项目National Key Ramp;D Program of China[2018AAA0102100] ; Key Research and Development Program of Hunan Province[2022SK2054] ; Natural Science Foundation of Hunan Province, China[2022JJ30762] ; 111 Project[B18059]
WOS研究方向Engineering
语种英语
出版者ELSEVIER SCI LTD
WOS记录号WOS:001012380700001
资助机构National Key Ramp;D Program of China ; Key Research and Development Program of Hunan Province ; Natural Science Foundation of Hunan Province, China ; 111 Project
源URL[http://ir.ia.ac.cn/handle/173211/53555]  
专题精密感知与控制研究中心_人工智能与机器学习
通讯作者Zhu, Chengzhang
作者单位1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
2.Peking Univ, Natl Inst Hlth Data Sci, Beijing 100871, Peoples R China
3.Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
推荐引用方式
GB/T 7714
Chen, Zhi,Zou, Beiji,Dai, Yulan,et al. Medical visual question answering with symmetric interaction attention and cross-modal gating[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL,2023,85:10.
APA Chen, Zhi,Zou, Beiji,Dai, Yulan,Zhu, Chengzhang,Kong, Guilan,&Zhang, Wensheng.(2023).Medical visual question answering with symmetric interaction attention and cross-modal gating.BIOMEDICAL SIGNAL PROCESSING AND CONTROL,85,10.
MLA Chen, Zhi,et al."Medical visual question answering with symmetric interaction attention and cross-modal gating".BIOMEDICAL SIGNAL PROCESSING AND CONTROL 85(2023):10.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。