ROMOT: Referring-expression-comprehension open-set multi-object tracking
文献类型:期刊论文
作者 | Li, Wei1,2; Li, Bowen1,2; Wang, Jingqi2,3; Meng, Weiliang1,2![]() ![]() ![]() |
刊名 | VISUAL COMPUTER
![]() |
出版日期 | 2024-06-19 |
页码 | 13 |
关键词 | Open-set Referring expression comprehension Detection Tracking |
ISSN号 | 0178-2789 |
DOI | 10.1007/s00371-024-03544-7 |
通讯作者 | Meng, Weiliang(weiliang.meng@ia.ac.cn) |
英文摘要 | Traditional multi-object tracking is limited to tracking a predefined set of categories, whereas open-vocabulary tracking expands its capabilities to track novel categories. In this paper, we propose ROMOT (referring-expression-comprehension open-set multi-object tracking), which not only tracks objects from novel categories not included in the training data, but also enables tracking based on referring expression comprehension (REC). REC describes targets solely by their attributes, such as "the person running at the front" or "the bird flying in the air rather than on the ground," making it particularly relevant for real-world multi-object tracking scenarios. Our ROMOT achieves this by harnessing the exceptional capabilities of a visual language model and employing multi-stage cross-modal attention to handle tracking for novel categories and REC tasks. Integrating RSM (reconstruction similarity metric) and OCM (observation-centric momentum) in our ROMOT eliminates the need for task-specific training, addressing the challenge of insufficient datasets. Our ROMOT enhances efficiency and adaptability in handling tracking requirements without relying on extensive tracking training data. |
资助项目 | National Natural Science Foundation of China ; Beijing Natural Science Foundation[L231013] ; Beijing Natural Science Foundation[JQ23014] ; [U21A20515] ; [62376271] ; [62071157] ; [62171321] ; [62162044] ; [62365014] ; [52175493] |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:001250256000001 |
出版者 | SPRINGER |
资助机构 | National Natural Science Foundation of China ; Beijing Natural Science Foundation |
源URL | [http://ir.ia.ac.cn/handle/173211/59049] ![]() |
专题 | 模式识别国家重点实验室_三维可视计算 |
通讯作者 | Meng, Weiliang |
作者单位 | 1.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 2.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing, Peoples R China 3.Cent China Normal Univ, Wuhan, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Wei,Li, Bowen,Wang, Jingqi,et al. ROMOT: Referring-expression-comprehension open-set multi-object tracking[J]. VISUAL COMPUTER,2024:13. |
APA | Li, Wei,Li, Bowen,Wang, Jingqi,Meng, Weiliang,Zhang, Jiguang,&Zhang, Xiaopeng.(2024).ROMOT: Referring-expression-comprehension open-set multi-object tracking.VISUAL COMPUTER,13. |
MLA | Li, Wei,et al."ROMOT: Referring-expression-comprehension open-set multi-object tracking".VISUAL COMPUTER (2024):13. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。