中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
ROMOT: Referring-expression-comprehension open-set multi-object tracking

文献类型:期刊论文

作者Li, Wei1,2; Li, Bowen1,2; Wang, Jingqi2,3; Meng, Weiliang1,2; Zhang, Jiguang1,2; Zhang, Xiaopeng1,2
刊名VISUAL COMPUTER
出版日期2024-06-19
页码13
关键词Open-set Referring expression comprehension Detection Tracking
ISSN号0178-2789
DOI10.1007/s00371-024-03544-7
通讯作者Meng, Weiliang(weiliang.meng@ia.ac.cn)
英文摘要Traditional multi-object tracking is limited to tracking a predefined set of categories, whereas open-vocabulary tracking expands its capabilities to track novel categories. In this paper, we propose ROMOT (referring-expression-comprehension open-set multi-object tracking), which not only tracks objects from novel categories not included in the training data, but also enables tracking based on referring expression comprehension (REC). REC describes targets solely by their attributes, such as "the person running at the front" or "the bird flying in the air rather than on the ground," making it particularly relevant for real-world multi-object tracking scenarios. Our ROMOT achieves this by harnessing the exceptional capabilities of a visual language model and employing multi-stage cross-modal attention to handle tracking for novel categories and REC tasks. Integrating RSM (reconstruction similarity metric) and OCM (observation-centric momentum) in our ROMOT eliminates the need for task-specific training, addressing the challenge of insufficient datasets. Our ROMOT enhances efficiency and adaptability in handling tracking requirements without relying on extensive tracking training data.
资助项目National Natural Science Foundation of China ; Beijing Natural Science Foundation[L231013] ; Beijing Natural Science Foundation[JQ23014] ; [U21A20515] ; [62376271] ; [62071157] ; [62171321] ; [62162044] ; [62365014] ; [52175493]
WOS研究方向Computer Science
语种英语
WOS记录号WOS:001250256000001
出版者SPRINGER
资助机构National Natural Science Foundation of China ; Beijing Natural Science Foundation
源URL[http://ir.ia.ac.cn/handle/173211/59049]  
专题模式识别国家重点实验室_三维可视计算
通讯作者Meng, Weiliang
作者单位1.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
2.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing, Peoples R China
3.Cent China Normal Univ, Wuhan, Peoples R China
推荐引用方式
GB/T 7714
Li, Wei,Li, Bowen,Wang, Jingqi,et al. ROMOT: Referring-expression-comprehension open-set multi-object tracking[J]. VISUAL COMPUTER,2024:13.
APA Li, Wei,Li, Bowen,Wang, Jingqi,Meng, Weiliang,Zhang, Jiguang,&Zhang, Xiaopeng.(2024).ROMOT: Referring-expression-comprehension open-set multi-object tracking.VISUAL COMPUTER,13.
MLA Li, Wei,et al."ROMOT: Referring-expression-comprehension open-set multi-object tracking".VISUAL COMPUTER (2024):13.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。