中国科学院机构知识库网格系统: Medical visual question answering with symmetric interaction attention and cross-modal gating

Medical visual question answering with symmetric interaction attention and cross-modal gating

文献类型：期刊论文


作者	Chen, Zhi 3; Zou, Beiji 3; Dai, Yulan 3; Zhu, Chengzhang 3; Kong, Guilan 2; Zhang, Wensheng1
刊名	BIOMEDICAL SIGNAL PROCESSING AND CONTROL
出版日期	2023-08-01
卷号	85 页码:10
关键词	Medical visual question answering Self-attention Information interaction Cross-modal gating
ISSN号	1746-8094
DOI	10.1016/j.bspc.2023.105049
通讯作者	Zhu, Chengzhang(anandawork@126.com)
英文摘要	The purpose of medical visual question answering (Med-VQA) is to provide accurate answers to clinical questions related to visual content of medical images. However, previous attempts neglect to take full advantage of the information interaction between medical images and clinical questions, which hinders the further progress of Med-VQA. The above issue requires the efforts to focus on critical information interaction within each modality and relevant information interaction between modalities. In this paper, we utilize the multiple meta-model quantifying model as visual encoder and the GloVe word embedding followed by the LSTM as textual encoder to form our feature extraction module. Then, we design a symmetric interaction attention module to construct dense and deep intra-and inter-modal information interaction on medical images and clinical questions for the Med-VQA task. Specifically, the symmetric interaction attention module consists of multiple symmetric interaction attention blocks that contain two basic units, i.e., self-attention and interaction attention. Technically, self-attention is introduced for intra-modal information interaction, while interaction attention is constructed for inter-modal information interaction. In addition, we develop a multi-modal fusion scheme that leverages the cross-modal gating to effectively fuse multi-modal information and avoid redundant information after sufficient intra-and inter-modal information interaction. Experimental results on the VQA-RAD dataset and PathVQA dataset show that our method outperforms other state-of-the-art Med-VQA models, achieving 74.7% and 48.7% on accuracy, 73.5% and 46.0% on F1-score, respectively.
资助项目	National Key Ramp;D Program of China[2018AAA0102100] ; Key Research and Development Program of Hunan Province[2022SK2054] ; Natural Science Foundation of Hunan Province, China[2022JJ30762] ; 111 Project[B18059]
WOS研究方向	Engineering
语种	英语
WOS记录号	WOS:001012380700001
出版者	ELSEVIER SCI LTD
资助机构	National Key Ramp;D Program of China ; Key Research and Development Program of Hunan Province ; Natural Science Foundation of Hunan Province, China ; 111 Project
源URL	[http://ir.ia.ac.cn/handle/173211/53555]
专题	精密感知与控制研究中心_人工智能与机器学习
通讯作者	Zhu, Chengzhang
作者单位	1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 2.Peking Univ, Natl Inst Hlth Data Sci, Beijing 100871, Peoples R China 3.Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
推荐引用方式 GB/T 7714	Chen, Zhi,Zou, Beiji,Dai, Yulan,et al. Medical visual question answering with symmetric interaction attention and cross-modal gating[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL,2023,85:10.
APA	Chen, Zhi,Zou, Beiji,Dai, Yulan,Zhu, Chengzhang,Kong, Guilan,&Zhang, Wensheng.(2023).Medical visual question answering with symmetric interaction attention and cross-modal gating.BIOMEDICAL SIGNAL PROCESSING AND CONTROL,85,10.
MLA	Chen, Zhi,et al."Medical visual question answering with symmetric interaction attention and cross-modal gating".BIOMEDICAL SIGNAL PROCESSING AND CONTROL 85(2023):10.

入库方式： OAI收割

来源：自动化研究所

下载0

Medical visual question answering with symmetric interaction attention and cross-modal gating

其他版本