隐含语义检索算法的改进及其在化学学科信息门户中的应用
文献类型:学位论文
作者 | 王志刚 |
学位类别 | 硕士 |
答辩日期 | 2003 |
授予单位 | 中国科学院过程工程研究所 |
授予地点 | 中国科学院过程工程研究所 |
导师 | 郭力 |
关键词 | 隐含语义检索(LSI) 化学学科信息门户 特征词条提取 镜像文档检测系统 |
其他题名 | Improving on Latent Semantic Indexing for Chemistry Portal |
学位专业 | 应用化学 |
中文摘要 | 该论文对隐含语义检索进行了研究,并对国家科学数字图书馆化学学科信息门户中原有的隐含语义检索系统进行了完善.该文详细讨论了隐含语义检索的原理和建立隐含语义检索系统的过程,通过改良文档预处理方法和对词条权重算法的研究,使新系统的性能较原有系统有了明显的提高.通过采用更快的SVD算法和并行化技术,显著提高了数据预处理的速度,使系统更加实用.最后,该论文还在隐含语义检索系统的基础上,分别构建了特征词条提取系统和镜像文档检测系统,作为隐含语义检索系统的基础上,分别构建了特征词条提取系统和镜像文档检测系统,作为隐含语义检索系统的补充.该论文研究了隐含语义检索实施中的关键技术,并根据化学学科信息门户的特点选择了适当的实现方案,使本系统成为了一个完善的、实用性很强的检索系统.作为化学学科信息门户中原有检索方法的辅助手段,隐含语义检索为用户获得需要的文档提供了便利.该论文还对隐含语义检索中的参数选择进行了研究. |
英文摘要 | With the rapid growth of Internet, it is more and more important to provide people easy ways to retrieve the textual documents. The traditional methods such as keyword searching and full-text searching, which use lexical matching, may sometimes be inaccurate when they are used to match the user's query. Therefore, a concept-based retrieval may be a better way to retrieve information on the basis of a conceptual topic or the meaning of a document. Latent Semantic Indexing (LSI), a kind of concept-based retrieval, is a completely automatic intelligent retrieval method which allows queries in natured language. In LSI, a term-document matrix is constructed and a truncated Singular Value Decomposition (SVD) is used, which is the kernel algorithm of the LSI. The author reviews the principles of LSI and proposes an improved program to a LSI system for the Chemistry Portal, the Chinese National Scientific Library. Compared to the original system, the performance is significantly improved by optimizing the data training steps and using a new term-weighting algorithm. Much faster SVD speed is achieved by applying a faster SVD algorithm and parallel computing technology; it makes the system more practical. To make the LSI system more complete, keyword discrimination system and replicated document detection system are discussed in the paper. This thesis investigates several key steps of LSI, and an application program package has been built for the Chemistry Portal. This new LSI system makes the Chemistry Portal more integrated and practicable. LSI, which is an addition of the lexical-based retrieval methods for Chemistry Portal, will make the Portal more convenient for users to get information. |
语种 | 中文 |
公开日期 | 2013-09-16 |
页码 | 89 |
源URL | [http://ir.ipe.ac.cn/handle/122111/1354] ![]() |
专题 | 过程工程研究所_研究所(批量导入) |
推荐引用方式 GB/T 7714 | 王志刚. 隐含语义检索算法的改进及其在化学学科信息门户中的应用[D]. 中国科学院过程工程研究所. 中国科学院过程工程研究所. 2003. |
入库方式: OAI收割
来源:过程工程研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。