中国科学院机构知识库网格系统: 音频信息检索关键技术研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

音频信息检索关键技术研究

文献类型：学位论文


作者	王磊
学位类别	工学博士
答辩日期	2009-06-05
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	徐波
关键词	哼唱音乐检索音频模板检索音频分类广播新闻条目自动分割 query by singing/humming audio template searching audio classification broadcasting news story segmentation
其他题名	Research on Audio Information Retrieval Technology
学位专业	模式识别与智能系统
中文摘要	如何管理海量音频数据库并有效地检索其中的内容不仅在学术界是一个热点研究课题，并且工业界也投入了大量人力物力在寻找新的应用模式和潜在盈利点。以音频数据中包含的内容为对象的音频信息检索技术研究，对促进该领域的实用化发展具有非常重要的意义。本文针对音频信息检索的研究热点和技术难点，对以下几个方面进行了广泛而深入的研究：（一）在哼唱音乐检索方面，本文提出了一套旋律库构建、旋律特征提取以及旋律库搜索的完整哼唱检索系统的实现方案。一、通过分析MIDI音乐格式和音轨特性，提出了从多音轨MIDI文件中提取主旋律音轨的方法，同时提出旋律乐谱中乐句切分方法以构建适应不同需求的旋律曲库；二、分别提取基频序列与音符序列特征，用于不同层级的旋律搜索算法；三、提出一种基于层级过滤的快速匹配算法, 并实现多级旋律相似度融合，不仅加快了检索速度，而且也提升了检索精度。（二）在音频模板检索方面进行了深入的研究，分别提出了基于音频向量空间表示的快速音频模板检索技术和基于音频指纹的音频模板检索技术。在统计音频帧特征的矢量量化出的各音频字分布的基础上，验证了齐夫定律在音频数据上也成立，并由此提出一种为音频字分配权重的方法（Audio-TFIDF）。基于算法的研究结果，将音频模板检索技术应用于对广播电视节目广告以及栏目片花的模板检索和基于重复性检测的新广告发现系统上，实验结果证明，音频模板检索技术在检索精度和检索速度上已满足投入实际应用的条件。（三）在基于音频分类的音频检索方面，本文针对SVM在处理大规模数据训练计算量太大，消耗时间过长的问题，将利用图形处理单元（GPU）加速的方法引入音频分类的训练中，不仅有效地降低了计算成本，并且速度上有很大提升。在此基础上，将音频分类技术应用于语音识别预处理中的语音/非语音分类以及音乐风格分类系统中。（四）在广播新闻的条目自动分割方面，本文提出了音视频信息结合的新闻节目条目索引建立方案，并在此基础上实现了一个基于图像关键帧聚类、音频模板检索、音频分类、说话人聚类与切分的新闻条目自动分割系统。
英文摘要	How to retrieve the vast audio information effectively and efficiently is not only a hot spot for researchers, but also a trend for the industrial community to build up new applications and find new ways to make profits. Through the three years of my Ph.D. study, I have investigated the key technologies of building audio information retrieval systems. The main research work focused on the following aspects: First of all, this thesis proposed a solution to build up a query by singing/humming system, from melody database building, melody feature extraction and melody matching. To automatic build melody database, I proposed a main melody track extraction algorithm from raw-MIDI files, and a melody phrase segmentation method; to extract robust feature, two feature extracting methods are adopted: pitch sequence extraction and note sequence extraction; to speed up the matching process, a candidate set reduction method is firstly adopted to filter out the unlikely candidates by faster but less precise methods; then a more accurate but slower strategy is executed on the survival candidate set to perform a finer match. At the decision level, I utilize these scores generated during the filtering stage and fine-matching stage to fuse together to get more accurate result. The proposed system participated in the QBSH contest, MIREX2008, and won the 1st place in both sub-tasks (for Roger Jang’s Corpus and ThinkIT’s Corpus). Second, in the area of audio template searching, this thesis referred to two different methods: fingerprinting-based template searching and audio vector space model-based template searching. This paper proposed a novel method for assigning a weight to an audio word according to the capability to distinguish different audio files. Based on the research work, I implemented an advertisement identification system and an automatic new advertisement detecting system, the experiment results show that these two systems could be put into practical use. Thirdly, the paper adopted a GPU-based SVM audio classification training method to speed up the training process, the result shows the GPU-based training could save 90% time compared to the CPU-based training. Furthermore, I utilized the audio classification to the three applications: pre-processing module for speech recognition, music genre classification and audio-based video retrieval. Finally, a system for automatic news story segmentation is implemented based on audio and video processing techniques....
语种	中文
其他标识符	200618014628036
源URL	[http://ir.ia.ac.cn/handle/173211/6220]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王磊. 音频信息检索关键技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2009.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。