中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
基于深度神经网络的大规模声学模型训练研究

文献类型:学位论文

作者游钊
学位类别工学博士
答辩日期2015-05-25
授予单位中国科学院大学
授予地点中国科学院自动化研究所
导师徐波
关键词深度神经网络 声学模型 深层玻尔兹曼机 分布式并行训练 多通道混合训练 Deep Neural Network Acoustic Model Deep Boltzmann Machine Distributed parallel training mixed-bandwidth training
其他题名Research on DNN-based large scale acoustic model training
学位专业模式识别与智能系统
中文摘要随着深度神经网络在大词汇量连续语音识别中的广泛应用,语音识别系统的性能较传统基于高斯混合模型的系统有了很大的提升,并达到了实际的应用要求。随着互联网上数据不断地积累,语音数据从最早的几十小时增长到现在的上万小时,目前数据量还在不断地增加。如何利用如此大规模的语音数据快速地训练语音识别系统成为一个急迫要解决的问题。本论文针对基于深度神经网络的大规模声学模型训练问题和在语音识别的具体应用中遇到的问题进行了 深入的探索和研究,取得的主要研究成果和创新点有: 1. 针对DNN预训练的算法进行了研究,提出将基于深层玻尔兹曼机的预训练模型应用于连续语音识别系统的深度神经网络训练中。在TIMIT数据集的phone识别任务中,基于深层玻尔兹曼机的深度神经网络和基于深层信度网络的深度神经网络相比,在核心测试集上PER相对下降了3.8%。 2. 针对采用单台服务器多GPU进行DNN训练方面,提出将基于均值随机梯度下降的one pass learning算法应用到深度神经网络的训练中。并提出将one pass learning算法和异步的并行方式相结合,使得该算法能在多GPU上运行。基于均值随机梯度下降的one pass learning算法和异步随机梯度算法相比训练速度提升了5.3倍。 3. 在研究DNN的分布式训练方面,提出了基于Stochastic Hessian Free算法的GPU集群训练方式,解决了异步并行算法中的机器之间通讯带宽要求较高的问题,并且和异步并行算法相比明显地提升了训练速度。 4. 涉及到多通道混合数据训练方面,本文提出了基于DNN自适应的方法来进行多通道混合训练,取得了比特征补零方式的DNN多通道混合训练方法更好的识别性能。进一步,本文采用基于奇异值分解的DNN训练加速方法,在24块GPU卡构成的GPU集群上,仅用7天时间完成了7500小时多通道语音数据的混合训练。
英文摘要In the past few years, Deep Neural Network (DNN) has been widely used in large vocabulary continuous speech recognition (LVCSR). Meanwhile, the DNN-based acoustic model achieves significant improvement over traditional GMM-based models, and further promotes the speech recognition system to satisfy the requirements of practical applications. With the development of internet, the amount of speech training data increases explosively from dozens of hours to thousands of hours nowadays. Hence, it becomes an urgent problem to exploit such large scale of speech data to train a high-performance recognition system efficiently. In this thesis, we study on the issues of DNN-based large-scale acoustic model training and several specific application problems of speech recognition technology. The main work and contributions include: 1. For the DNN pre-training problem, we propose to apply the Deep Boltzmann Machine (DBM) pre-training model for the DNN training procedure in LVCSR. In the task of phone recognition on TIMIT dataset, the DBM-DNN achieves 3.8% relative PER reduction on the core test set comparing with Deep Belief Network based DNN (DBN-DNN). 2. To train DNN on multi-GPUs from single-server, we propose to apply the one pass learning algorithm based on average stochastic gradient descent (ASGD) to the DNN training procedure. Furthermore, by combining with the asynchronous parallel mode, one pass learning algorithm successes to operate on multiple GPUs of single-server. The asynchronous ASGD algorithm accelerates the DNN training speed by 5.3 times, comparing with asynchronous stochastic gradient algorithm. 3. For the distributed DNN training problem, we propose a novel GPU cluster training pattern based on the Stochastic Hessian Free (SHF) algorithm, and effectively solve the problem of demanding high communication bandwidth between machines in the asynchronous parallel algorithm. Specially, the SHF algorithm speeds up the DNN training procedure on GPU cluster obviously comparing with asynchronous parallel algorithm. 4. For the mixed-bandwidth training problem, we propose an DNN adaptation approach to train DNN on mixed-bandwidth speech data, and achieves better performance than the feature zero-padding based mixed-bandwidth training methods. Besides, by exploiting the singular value decomposition (SVD) algorithm, we accomplish training DNN on the GPU cluster with 24 GPUs with 7500 hours of mixed-bandwidth speech data in seven days.
语种中文
其他标识符201218014628080
源URL[http://ir.ia.ac.cn/handle/173211/6679]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
游钊. 基于深度神经网络的大规模声学模型训练研究[D]. 中国科学院自动化研究所. 中国科学院大学. 2015.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。