广播新闻语料自动识别系统
文献类型:学位论文
作者 | 吕萍 |
学位类别 | 博士 |
答辩日期 | 2003 |
授予单位 | 中国科学院声学研究所 |
授予地点 | 中国科学院声学研究所 |
关键词 | 语音识别 广播新闻语料识别系统 音频匹配 自动分段 音频分类 说话人聚类 识别后处理 |
其他题名 | Broadcast News Automatic Transcription System |
中文摘要 | 近年来,语音识别的研究对象从实验室内朗读式语音转向现实生活中的真实语音信号。广播新闻类语料作为真实语音的主要来源之一,成为语音识别的研究重点。本报告针对广播新闻语料声学环境复杂多变的特点,构建了完整的广播新闻语料识别系统一ThinkIT-BNR系统。有别于传统的大词表连续语音识别系统,该ThinkIT-BNR系统包括:音频匹配、音频自动分段、音频分类、说话人聚类、识别后处理,以及多阶段识别策略等多个模块。本报告提出和实现了多种算法,它们是:提出了基于距离的可变长非同源音频匹配算法,该算法能够根据提示音乐快速定位新闻节目的边界;从韵律节奏出发,提出了多种基于能量的自动分段算法,其中基于方差的分段算法简单易行且性能与手工分段相当;实现了基于混合高斯模型的音频自动分类算法,其中男、女声的分类正确率均高于98%;根据广播新闻的特点,提出了分层说话人聚类算法,该算法首先用贪心算法得到初始类,然后利用快速最近邻算法对初始类进行快速聚类;通过时间相似度聚类和音素相似度聚类实现了基于混淆网络的识别后处理。本报告中还标注了70小时的广播新闻语料。对新闻联播节目的测试表明,ThinkIT-BNR系统的误识率仅为10.14%。 |
英文摘要 | Past speech recognition research has focused mainly on the decoding of high quality speech in quiet environments. Recently, however, the focus has shifted to speech found in the "real world". One of the data sources of real-world speech are audio recordings from radio and television broadcast news (BN). This paper describes how the particular challenges of the broadcast news domain have been addressed by the ThinkIT Broadcast News recognition system. As compared to previous work involving automatic speech recognition, the ThinkIT BNR system consists of several modules: audio pattern match, automatic segmentation, audio classification, speaker cluster, post-processing etc. This paper develops and implements several algorithms. A new metric-based flexible audio pattern match algorithm is proposed. This method can quickly detect and locate news program in a long audio stream for a given cue-audio. Several energy-based automatic segmentation algorithms are proposed from the point of prosody characteristic. The goal of automatic segmentation is to detect changes in speaker identity, environmental condition and channel. The variance-based segmentation algorithm is simple and effective. The performance of it is almost equivalent to that of manual segmentation. Implement model-based audio classification approach. Each audio segment is classified as being male, female, music or noise by Gaussian mixture models (GMM) with 512 mixture components. Experiments show the gender classification is very precise. The hierarchical clustering algorithm is proposed, according to the characteristic of broadcast news. Firstly the neighboring segments are clustered using a greedy criterion to get cluster-seeds. Then these seeds are clustered using fast nearest-neighbor algorithm. Experiments show the cluster purity is more than 93%. Implement confusion-network-based post-processing algorithm which yields reduced word error rates. This method produces a new representation of the set of candidate hypotheses that specifies the sequence of word-level confusions in an aligned lattice format. Experiments on XinWinLianBo database show the word error rate of ThinldT BNR system is just 10.14%. |
语种 | 中文 |
公开日期 | 2011-05-07 |
页码 | 81 |
源URL | [http://159.226.59.140/handle/311008/906] ![]() |
专题 | 声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文 |
推荐引用方式 GB/T 7714 | 吕萍. 广播新闻语料自动识别系统[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2003. |
入库方式: OAI收割
来源:声学研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。