中国科学院机构知识库网格系统: Context Disentangling and Prototype Inheriting for Robust Visual Grounding

Context Disentangling and Prototype Inheriting for Robust Visual Grounding

文献类型：期刊论文


作者	Tang, Wei 3; Li, Liang 2; Liu, Xuejing 1; Jin, Lu 3; Tang, Jinhui 3; Li, Zechao 3
刊名	IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
出版日期	2024-05-01
卷号	46 期号:5 页码:3213-3229
关键词	Visualization Grounding Prototypes Transformers Task analysis Linguistics Feature extraction Context disentangling open-vocabulary scene prototype discovering robust grounding visual grounding (VG)
ISSN号	0162-8828
DOI	10.1109/TPAMI.2023.3339628
英文摘要	Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed for the standard scene (without any novel object), which limits their generalization to the open-vocabulary scene. In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes. Specifically, the context disentangling disentangles the referent and context features, which achieves better discrimination between them. The prototype inheriting inherits the prototypes discovered from the disentangled visual features by a prototype bank to fully utilize the seen data, especially for the open-vocabulary scene. The fused features, obtained by leveraging Hadamard product on disentangled linguistic and visual features of prototypes to avoid sharp adjusting the importance between the two types of features, are then attached with a special token and feed to a vision Transformer encoder for bounding box regression. Extensive experiments are conducted on both standard and open-vocabulary scenes. The performance comparisons indicate that our method outperforms the state-of-the-art methods in both scenarios.
资助项目	National Key Research and Development Program of China
WOS研究方向	Computer Science ; Engineering
语种	英语
WOS记录号	WOS:001196751500059
出版者	IEEE COMPUTER SOC
源URL	[http://119.78.100.204/handle/2XEOYT63/38718]
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Li, Zechao
作者单位	1.SenseTime Res, Beijing 100084, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 3.Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
推荐引用方式 GB/T 7714	Tang, Wei,Li, Liang,Liu, Xuejing,et al. Context Disentangling and Prototype Inheriting for Robust Visual Grounding[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2024,46(5):3213-3229.
APA	Tang, Wei,Li, Liang,Liu, Xuejing,Jin, Lu,Tang, Jinhui,&Li, Zechao.(2024).Context Disentangling and Prototype Inheriting for Robust Visual Grounding.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,46(5),3213-3229.
MLA	Tang, Wei,et al."Context Disentangling and Prototype Inheriting for Robust Visual Grounding".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 46.5(2024):3213-3229.

入库方式： OAI收割

来源：计算技术研究所

下载0

Context Disentangling and Prototype Inheriting for Robust Visual Grounding

其他版本