中国科学院机构知识库网格系统: CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

文献类型：期刊论文


作者	Wang, Wenxuan 1,2; He, Xingjian1 ; Zhang, Yisi 3; Guo, Longteng1 ; Shen, Jiachen 3; Li, Jiangyun 3; Liu, Jing1,2
刊名	IEEE TRANSACTIONS ON MULTIMEDIA
出版日期	2024
卷号	26 页码:6906-6916
关键词	Referring image segmentation cross-modality guidance masked self-distillation vision and language
ISSN号	1520-9210
DOI	10.1109/TMM.2024.3358085
通讯作者	Li, Jiangyun(leejy@ustb.edu.cn)
英文摘要	Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression. Due to the essentially distinct data properties between image and text, most of existing methods either introduce complex designs towards fine-grained vision-language alignment or lack required dense alignment, resulting in scalability issues or mis-segmentation problems such as over- or under-segmentation. To achieve effective and efficient fine-grained feature alignment in the RIS task, we explore the potential of masked multimodal modeling coupled with self-distillation and propose a novel cross-modality masked self-distillation framework named CM-MaskSD, in which our method inherits the transferred knowledge of image-text semantic alignment from CLIP model to realize fine-grained patch-word feature alignment for better segmentation accuracy. Moreover, our CM-MaskSD framework can considerably boost model performance in a nearly parameter-free manner, since it shares weights between the main segmentation branch and the introduced masked self-distillation branches, and solely introduces negligible parameters for coordinating the multimodal features. Comprehensive experiments on three benchmark datasets (i.e. RefCOCO, RefCOCO+, G-Ref) for the RIS task convincingly demonstrate the superiority of our proposed framework over previous state-of-the-art methods.
WOS关键词	NETWORK
资助项目	National Key Research and Development Program of China
WOS研究方向	Computer Science ; Telecommunications
语种	英语
WOS记录号	WOS:001209811000040
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	National Key Research and Development Program of China
源URL	[http://ir.ia.ac.cn/handle/173211/58366]
专题	自动化研究所_模式识别国家重点实验室_图像与视频分析团队
通讯作者	Li, Jiangyun
作者单位	1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China 3.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
推荐引用方式 GB/T 7714	Wang, Wenxuan,He, Xingjian,Zhang, Yisi,et al. CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2024,26:6906-6916.
APA	Wang, Wenxuan.,He, Xingjian.,Zhang, Yisi.,Guo, Longteng.,Shen, Jiachen.,...&Liu, Jing.(2024).CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation.IEEE TRANSACTIONS ON MULTIMEDIA,26,6906-6916.
MLA	Wang, Wenxuan,et al."CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation".IEEE TRANSACTIONS ON MULTIMEDIA 26(2024):6906-6916.

入库方式： OAI收割

来源：自动化研究所

下载0

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

其他版本