中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Context Disentangling and Prototype Inheriting for Robust Visual Grounding

文献类型:期刊论文

作者Tang, Wei3; Li, Liang2; Liu, Xuejing1; Jin, Lu3; Tang, Jinhui3; Li, Zechao3
刊名IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
出版日期2024-05-01
卷号46期号:5页码:3213-3229
关键词Visualization Grounding Prototypes Transformers Task analysis Linguistics Feature extraction Context disentangling open-vocabulary scene prototype discovering robust grounding visual grounding (VG)
ISSN号0162-8828
DOI10.1109/TPAMI.2023.3339628
英文摘要Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed for the standard scene (without any novel object), which limits their generalization to the open-vocabulary scene. In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes. Specifically, the context disentangling disentangles the referent and context features, which achieves better discrimination between them. The prototype inheriting inherits the prototypes discovered from the disentangled visual features by a prototype bank to fully utilize the seen data, especially for the open-vocabulary scene. The fused features, obtained by leveraging Hadamard product on disentangled linguistic and visual features of prototypes to avoid sharp adjusting the importance between the two types of features, are then attached with a special token and feed to a vision Transformer encoder for bounding box regression. Extensive experiments are conducted on both standard and open-vocabulary scenes. The performance comparisons indicate that our method outperforms the state-of-the-art methods in both scenarios.
资助项目National Key Research and Development Program of China
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:001196751500059
出版者IEEE COMPUTER SOC
源URL[http://119.78.100.204/handle/2XEOYT63/38718]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Li, Zechao
作者单位1.SenseTime Res, Beijing 100084, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
3.Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
推荐引用方式
GB/T 7714
Tang, Wei,Li, Liang,Liu, Xuejing,et al. Context Disentangling and Prototype Inheriting for Robust Visual Grounding[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2024,46(5):3213-3229.
APA Tang, Wei,Li, Liang,Liu, Xuejing,Jin, Lu,Tang, Jinhui,&Li, Zechao.(2024).Context Disentangling and Prototype Inheriting for Robust Visual Grounding.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,46(5),3213-3229.
MLA Tang, Wei,et al."Context Disentangling and Prototype Inheriting for Robust Visual Grounding".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 46.5(2024):3213-3229.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。