|
作者 | Wangli Hao1,2 ; Zhaoxiang Zhang1,2,3 ; He Guan1
|
出版日期 | 2018
|
会议日期 | 2018.2.1
|
会议地点 | Hilton New Orleans Riverside, American
|
英文摘要 | Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the
missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks:
audio-to-visual, visual-to-audio, audio-to-audio and visualto-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.
|
语种 | 英语
|
源URL | [http://ir.ia.ac.cn/handle/173211/23880]  |
专题 | 自动化研究所_模式识别国家重点实验室 自动化研究所_智能感知与计算研究中心
|
通讯作者 | Zhaoxiang Zhang |
作者单位 | 1.Center of Research on Intelligent Perception and Computing 2.Institute of Automation, University of Chinese Academy of Sciences 3.Center for Excellence in Brain Science and Intelligence Technology (CEBSIT) 4.CAS Center for Excellence in Brain Science and Intelligence
|
推荐引用方式 GB/T 7714 |
Wangli Hao,Zhaoxiang Zhang,He Guan. CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation[C]. 见:. Hilton New Orleans Riverside, American. 2018.2.1.
|