中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
限定领域口语语言建模研究

文献类型:学位论文

作者胡晟
学位类别工学硕士
答辩日期2004-06-01
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师徐波 ; 张树武
关键词语音识别 语言模型 FSN语言模型 n元文法语言模型 基于关键信息的语言模型 Automatic Speech Recognition Language Model FSN LM n-gram Lm Key Information Language Model
其他题名Spoken Language Modeling for Limited Domain
学位专业模式识别与智能系统
中文摘要经过几十年发展,语音识别技术正在走向实用.虽然在不同应用中,具体识别系统的构成有所不同,但总的来说一个完整的语音识别系统应包含前端预处理、声学模型、语言模型和搜索引擎四个部分.其中的语言模型除了在孤立词识别系统中可以省略外,在多数系统中都不可缺少.并且,在语音识别系统向具体应用领域转化的过程中,语言模型非常重要,它能利用特定领域的先验知识,有效指导识别过程. 在限定领域的应用中,特别是口语识别中,我们经常遇到语料不充分的现象,或者在短时间内难以收集到足够的语料,这给训练语言模型带来极大困难.平滑算法的改进只能在一定程序上弥补数据稀疏性带来的问题.在尝试了以FSN为代表的基于规则的语言模型和基于类的n元文法模型后,我们提出了基于关键信息的建模方法,在应用中取得了较好的效果.本文涉及的工作主要有: ①在基于规则的语言建模方面,我们采用FSN的方法来描述语法,实现了基于FSN的语音识别引擎,在语音信息查询应用中采用FSN语言模型取得了较好的效果. ②在预处理、分词、平滑算法等方面,对现有的n元文法模型的训练过程进行改善,提出了基于裁减门限的Katz改进平滑算法. ③在基于类的n元文法语言模型和英、汉双语语言模型方面做了一些尝试. ④提出基于关键信息的语言建模方法,缓解了训练数据稀疏给语言模型训练造成的影响,能有效识别用户话语中的关键信息,同时过滤掉无用信息.在此基础上,实现了实时识别系统,在面向奥运的语音信息查询系统项目中取得了良好的效果. ⑤为"基于语音的网站导航系统"设计基于汉字的n元文法模型. 总之,文本介绍了作者在限定领域口语识别背景下的语言建模方面的一些工作.
英文摘要As it is more and more mature, the Automatic Speech Recognition(ASR) technology is being applied to daily life nowadays. Although the composition of an ASR system may differ from task to task, a normal system is usually composed of the Front-end module, the Acoustic Model(AM), the Language Model(LM), and the Search module. The Language Model is indispensable in many ASR systems except for such system as the Isolated Word Recognition System. Moreover, the Language Model plays an important role in the application of ASR technology to the specific task because it provides the prior knowledge about the task and supervises the searching process efficiently. For some tasks with limited domain, especially in spoken language, the corpus is not enough and is hard to collect in a short time, which blocks the application of ASR. Improvement of smoothing algorithm can only solve the data sparse problem to some extent. Having tried the FSN LM and class based n-gram LM, we proposed an Key Information LM which alleviates the sparse problem. The main work contained includes: ① Designed an FSN-based ASR system and established FSN language model for the Speech-based Information Inquiry System for Olympic Game project (SIISOG). ② Improved the training of Chinese n-gram LM from the aspects of preprocessing, word segmenting and smoothing algorithm. A modified Katz smoothing based on cutoffs is proposed to overcome the drawback that the classic Katz formula relies on the number of singletons even after cutting off the events with low frequency. ③ Tried to adopt class-based n-gram LM for the task with small traing data. Designed English-Chinese bilingual LM. ④ Proposed an Key Information Language Model (KILM) and designed a re-altime ASR system based on KILM which can capture the key information pattern in user's utterance and ignore the trivial information. We designed a traffic information inquiry system which performed well in the The 7th Bei-jing International Hi-tech Expo. We also applied it to the SIISOG project successfully. ⑤ Designed Character-based n-gram LM for the speech-based website navigation system. In a word. the thesis is the summary of my work on spoken language modeling for limited domain.
语种中文
其他标识符767
源URL[http://ir.ia.ac.cn/handle/173211/6771]  
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
胡晟. 限定领域口语语言建模研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2004.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。