中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
HNC语料库的设计与实现

文献类型:学位论文

作者谭露
学位类别博士
答辩日期2005
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词HNC理论 语料库 语料库建设 语料标注 语料检索统计
其他题名The Design and Implementation of HNC Corpus
中文摘要语料库是语言学研究和自然语言处理研究的基础资源,随着计算机技术的高速发展,计算机存储和处理语料的能力愈来愈强,语料库在语言学研究和自然语言处理等相关领域的研究中发挥着愈来愈重要的作用。HNC理论作为一个中文信息处理的流派,它的发展要求HNC语料库的同步发展。在过去的数十年里,国内外语料库的建设和相关研究非常活跃,并且已取得了很多成绩。但是,这些语料库多数以词性标注为基础,服务于句法分析,他们很难直接作为服务于HNC理论研究的语料库。建设服务于HNC理论及其相关自然语言处理研究的HNC语料库成为当务之急。本文在HNC理论的总体框架指导下,在HNC语料库已有素材的基础上,全面开展HNC语料库的建设工作,主要包括三个方面:语料库整体框架的构建、HNC汉语语料库的建设和语料库应用平台的设计与实现。论文取得的主要进展和贡献如下:(1)构筑了统一的、功能比较完备的HNC语料库整体框架,它包括HNC六库语料库和语料库应用平台。(2)建立形成了HNC汉语生语料库和HNC汉语熟语料库。(3)设计并编程实现了集语料加工、管理、检索和统计于一身的HNC语料库应用平台,它包含HNC语料库加工管理子系统和HNC语料库检索统计子系统。(4)设计开发了HNC语料标注工具,不仅能够使语料标注人员便捷的进·行语料标注,而且能够对标注结果进行形式上和内容上的检查,同时还提供即时的标注帮助。(5)设计和开发了界面友好、功能强大、使用便捷的语料库辅助工具软件.-HNC点点通,用户能够通过信息查询和屏幕取词等方式及时、便捷、准确、全面地获取所需的HNC信息,在进行语料标注时,通过软件工具的查询可以获得词形的句类和相关例句等。(6)将先进的计算机技术引入到语料库软件的开发中来,如界面编程、屏幕取词、正则表达式等技术。综上,本文主要完成了HNC语料库的设计与实现工作,重点在汉语语料库方面。本文的工作是HNC语料库软件建设的开始,将为HNc语料库的全面建成提供基础。
英文摘要Corpus is an important resource for linguistic study and Natural Language Processing. With the rapid development of computer technology, the storage capability and the performance of processing language resources of the computer is becoming more and more powerful, and corpus played more important role in linguistic study, NLP and related fields. HNC theory is a novel theory on NLP, and it needs the corresponding development of corpus. As a very active field, the corpus research developed rapidly and many achievements were obtained. However, these corpora can not be used directly by HNC. The reason is obvious, that the corpora were developed on the basis of POS for syntax analysis. To sum up, it is very necessary and urgent to build our own HNC corpus. The main contributions of this dissertation are as follow: We have constructed a uniform framework with mature functions of HNC corpus, which includes HNC corpus and the corpus application platform. We built HNC Chinese raw corpus and Chinese HNC tagged corpus. We have designed and implemented HNC corpus application software platform with the functions of corpus tagging, managing, searching and statistic. The application platform of HNC corpus includes two sub-systems, one is HNC tagging and managing sub-system, and the other is HNC searching and statistic sub-system. We have designed and implemented HNC corpus tagging tool. Not only can the tool make corpus tagging easier with convenient toolbar, but also can it support the functions of error checking and instant tagging help. We have designed and implemented HNC Instant Assistant. As a corpus assistant tool, it has many advantages such as friendly interface, powerful functions and good usability. The user has two methods of getting the required HNC information, one is to search information with keyboard input, and the other is fetching words directly from the screen. We use many advanced computer technologies in the HNC corpus software development, namely, the interface programming, the instant tagging help and the regular expressions and so on.
语种中文
公开日期2011-05-07
页码74
源URL[http://159.226.59.140/handle/311008/978]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
谭露. HNC语料库的设计与实现[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.

入库方式: OAI收割

来源:声学研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。