中国科学院机构知识库网格系统: Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition

文献类型：期刊论文


作者	Yuan, Yuan; Tian, Chunlin; Lu, Xiaoqiang1
刊名	IEEE ACCESS
出版日期	2018
卷号	6 页码:5573-5583
关键词	Aduio-visual Systems Recurrent Neural Networks Generative Adversarial Networks
ISSN号	2169-3536
DOI	10.1109/ACCESS.2018.2796118
产权排序	1
英文摘要	Audio-visual speech recognition (AVSR) utilizes both audio and video modalities for the robust automatic speech recognition. Most deep neural network (DNN) has achieved promising performances in AVSR owing to its generalized and nonlinear mapping ability. However, these DNN models have two main disadvantages: 1) the first disadvantage is that most models alleviate the AVSR problems neglecting the fact that the frames are correlated; and 2) the second disadvantage is the feature learned by the mentioned models is not credible. This is because the joint representation learned by the fusion fails to consider the specific information of categories, and the discriminative information is sparse, while the noise, reverberation, irrelevant image objection, and background are redundancy. Aiming at relieving these disadvantages, we propose the auxiliary loss multimodal GRU (alm-GRU) model including three parts: feature extraction, data augmentation, and fusion & recognition. The feature extraction and data augmentation are a complete effective solution for the processing raw complete video and training, and precondition for later core part: fusion & recognition using alm-GRU equipped with a novel loss which is an end-to-end network combining both fusion and recognition, furthermore considering the modal and temporal information. The experiments show the superiority of our model and necessity of the data augmentation and generative component in the benchmark data sets.
语种	英语
WOS记录号	WOS:000426304300001
源URL	[http://ir.opt.ac.cn/handle/181661/30774]
专题	西安光学精密机械研究所_光学影像学习与分析中心
作者单位	1.Chinese Acad Sci, Xian Inst Opt & Precis Mech, Ctr Opt Imagery Anal & Learning, Xian 710119, Shaanxi, Peoples R China; 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
推荐引用方式 GB/T 7714	Yuan, Yuan,Tian, Chunlin,Lu, Xiaoqiang. Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition[J]. IEEE ACCESS,2018,6:5573-5583.
APA	Yuan, Yuan,Tian, Chunlin,&Lu, Xiaoqiang.(2018).Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition.IEEE ACCESS,6,5573-5583.
MLA	Yuan, Yuan,et al."Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition".IEEE ACCESS 6(2018):5573-5583.

入库方式： OAI收割

来源：西安光学精密机械研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。