中国科学院机构知识库网格系统: CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation

文献类型：会议论文


作者	Wangli Hao1,2 ; Zhaoxiang Zhang1,2,3 ; He Guan1
出版日期	2018
会议日期	2018.2.1
会议地点	Hilton New Orleans Riverside, American
英文摘要	Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks: audio-to-visual, visual-to-audio, audio-to-audio and visualto-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/23880]
专题	自动化研究所_模式识别国家重点实验室自动化研究所_智能感知与计算研究中心
通讯作者	Zhaoxiang Zhang
作者单位	1.Center of Research on Intelligent Perception and Computing 2.Institute of Automation, University of Chinese Academy of Sciences 3.Center for Excellence in Brain Science and Intelligence Technology (CEBSIT) 4.CAS Center for Excellence in Brain Science and Intelligence
推荐引用方式 GB/T 7714	Wangli Hao,Zhaoxiang Zhang,He Guan. CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation[C]. 见:. Hilton New Orleans Riverside, American. 2018.2.1.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。