Medical visual question answering with symmetric interaction attention and cross-modal gating
文献类型:期刊论文
作者 | Chen, Zhi3; Zou, Beiji3; Dai, Yulan3; Zhu, Chengzhang3; Kong, Guilan2; Zhang, Wensheng1 |
刊名 | BIOMEDICAL SIGNAL PROCESSING AND CONTROL |
出版日期 | 2023-08-01 |
卷号 | 85页码:10 |
ISSN号 | 1746-8094 |
关键词 | Medical visual question answering Self-attention Information interaction Cross-modal gating |
DOI | 10.1016/j.bspc.2023.105049 |
通讯作者 | Zhu, Chengzhang(anandawork@126.com) |
英文摘要 | The purpose of medical visual question answering (Med-VQA) is to provide accurate answers to clinical questions related to visual content of medical images. However, previous attempts neglect to take full advantage of the information interaction between medical images and clinical questions, which hinders the further progress of Med-VQA. The above issue requires the efforts to focus on critical information interaction within each modality and relevant information interaction between modalities. In this paper, we utilize the multiple meta-model quantifying model as visual encoder and the GloVe word embedding followed by the LSTM as textual encoder to form our feature extraction module. Then, we design a symmetric interaction attention module to construct dense and deep intra-and inter-modal information interaction on medical images and clinical questions for the Med-VQA task. Specifically, the symmetric interaction attention module consists of multiple symmetric interaction attention blocks that contain two basic units, i.e., self-attention and interaction attention. Technically, self-attention is introduced for intra-modal information interaction, while interaction attention is constructed for inter-modal information interaction. In addition, we develop a multi-modal fusion scheme that leverages the cross-modal gating to effectively fuse multi-modal information and avoid redundant information after sufficient intra-and inter-modal information interaction. Experimental results on the VQA-RAD dataset and PathVQA dataset show that our method outperforms other state-of-the-art Med-VQA models, achieving 74.7% and 48.7% on accuracy, 73.5% and 46.0% on F1-score, respectively. |
资助项目 | National Key Ramp;D Program of China[2018AAA0102100] ; Key Research and Development Program of Hunan Province[2022SK2054] ; Natural Science Foundation of Hunan Province, China[2022JJ30762] ; 111 Project[B18059] |
WOS研究方向 | Engineering |
语种 | 英语 |
出版者 | ELSEVIER SCI LTD |
WOS记录号 | WOS:001012380700001 |
资助机构 | National Key Ramp;D Program of China ; Key Research and Development Program of Hunan Province ; Natural Science Foundation of Hunan Province, China ; 111 Project |
源URL | [http://ir.ia.ac.cn/handle/173211/53555] |
专题 | 精密感知与控制研究中心_人工智能与机器学习 |
通讯作者 | Zhu, Chengzhang |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 2.Peking Univ, Natl Inst Hlth Data Sci, Beijing 100871, Peoples R China 3.Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China |
推荐引用方式 GB/T 7714 | Chen, Zhi,Zou, Beiji,Dai, Yulan,et al. Medical visual question answering with symmetric interaction attention and cross-modal gating[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL,2023,85:10. |
APA | Chen, Zhi,Zou, Beiji,Dai, Yulan,Zhu, Chengzhang,Kong, Guilan,&Zhang, Wensheng.(2023).Medical visual question answering with symmetric interaction attention and cross-modal gating.BIOMEDICAL SIGNAL PROCESSING AND CONTROL,85,10. |
MLA | Chen, Zhi,et al."Medical visual question answering with symmetric interaction attention and cross-modal gating".BIOMEDICAL SIGNAL PROCESSING AND CONTROL 85(2023):10. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。