Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
文献类型:会议论文
作者 | Gao, Junyu1,3; Chen, Mengyuan1,3; Xu, Changsheng1,2,3 |
出版日期 | 2023 |
会议日期 | 2022-06-18 |
会议地点 | Vancouver, Canada |
英文摘要 | With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized property of audio-visual tracks or discount the complementary modality for explicit enhancement. We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal. To this end, we propose to collect Cross-Modal Presence-Absence Evidence (CMPAE) in a unified framework. Specifically, by leveraging uni-modal and cross-modal representations, a presence-absence evidence collector (PAEC) is designed under Subjective Logic theory. To learn the evidence in a reliable range, we propose a joint-modal mutual learning (JML) process, which calibrates the evidence of diverse audible, visible, and audi-visible events adaptively and dynamically. Extensive experiments show that our method surpasses state-of-the-arts (e.g., absolute gains of $3.6\%$ and $6.1\%$ in terms of event-level visual and audio metrics). Code is available in github.com/MengyuanChen21/CVPR2023-CMPAE. |
源URL | [http://ir.ia.ac.cn/handle/173211/51577] |
专题 | 多模态人工智能系统全国重点实验室 |
作者单位 | 1.School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) 2.Peng Cheng Laboratory 3.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA) |
推荐引用方式 GB/T 7714 | Gao, Junyu,Chen, Mengyuan,Xu, Changsheng. Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception[C]. 见:. Vancouver, Canada. 2022-06-18. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。