中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
基于隐语义图谱的桌面搜索方法研究及应用

文献类型:学位论文

作者皇甫杨
学位类别硕士
答辩日期2015-05-26
授予单位中国科学院研究生院
授予地点北京
导师王青
关键词隐语义建模 图模型 信息检索 桌面搜索
学位专业计算机软件与理论
中文摘要

桌面搜索(或称个人信息检索)是定义在本地个人信息空间上的搜索过程,旨在帮助个人用户有效的搜索到所需要的本地资源(即文件)。近年来随着社会信息化进程的不断推进,大数据时代悄然来临,个人用户在本地计算机上生成和存储的数据爆炸式的增长。个人数据的存储和管理也已经进入了TB级时代。个人计算机用户对快速准确的搜索庞大的本地数据的需求日趋强烈。这使得桌面搜索在近年来成为了工业界和学术界关注和研究的热点领域。在工业界已经有一些被广大用户熟知的桌面搜索解决方案,比如Google Desktop SearchWindows Desktop Search等,这些传统的桌面搜索解决方案实现的是基于关键字的检索,而没有考虑本地资源之间潜在的语义关系。这就要求用户必须准确的记忆和键入搜索关键词,而这样的搜索结果其实是不充分的。在信息检索中,丰富的有意义的关联关系和信号的引入能够有效的提升搜索结果的质量。在本地环境下,资源之间直观上来看相互独立毫无关联。然而个人计算机上的资源的创建、浏览、存储因人而异,和用户的使用习惯、个人经验和记忆等息息相关。用户这种管理资源的习惯、经验和记忆在资源之间无形的产生了某些隐性的语义关联。对资源之间潜在的关联关系进行挖掘和利用为桌面搜索的研究提供了非常多的可能性。通过观察我们发现用户在使用个人计算机时有一个普遍的模式:“操作某些资源以完成跟某个特定主题相关的任务,并且这些资源会被用户根据资源之间的某种关系组织到某些特定的目录中存储”。这一发现启发我们“主题信息”、“用户历史行为信息”、“目录结构”对于定位本地资源是非常有帮助的。

本文提出了一种基于统一的多维隐语义图谱LSGLatent Semantic Graph)的桌面搜索方法。该方法能够分别从本地资源的内容、用户的历史行为数据以及资源的目录存储结构中挖掘并量化两两资源之间的关联关系,并将三种关系整合为统一的隐语义关系图谱 LSG来系统地表征本地资源之间的关联体系。然后在LSG的基础上,分别实现了基于资源之间的隐语义关系的个性化排名算法和推荐算法,来提升传统的基于关键词的搜索效果并向用户推荐更多间接相关的结果以改善用户的搜索体验。当一个查询到来时,本文的搜索方法会先利用向量空间模型从索引抽取相关结果集合,然后基于LSG的排名算法会对结果集进行重排序,同时基于LSG的推荐算法会为结果集中的每个结果推荐5个最相关的本地资源。为了更好的研究基于LSG的搜索方法的有效性,本文设计并实现了基于LSG的桌面搜索原型系统,并将其与主流的桌面搜索引擎以及目前比较先进的学术界方法实现的系统进行对比实验,结果表明本文的方法有着较好的性能和表现。
英文摘要

Desktop Search refers to the process of searching within one’s personal space of information, which is aimed to help user search the local resources effectively. With the development of informatization, the personal data generated and stored on PC grows rapidly. The management on personal data steps into the era of TB. How to pinpoint the local resources among local data ocean quickly and actually has become a hot research topic in both industry and research communities. Major Internet service providers have released their prominent desktop search applications recently, such as Google Desktop, Windows Desktop. These traditional solutions are keyword based search without considering any kinds of implicit semantic relations, which will result in an insufficient search. In Information retrieval, introducing rich meaningful association signals can help improve the search. Intuitively, it seems that personal resources are independent with each other. In fact, most local items have been explicitly viewed, created, or saved by the user. As such, there items are personal to the individual and are intertwined with personal experience and memories which indicates that implicit associations among local resources exist extensively. These associations can be further used to improve traditional keyword-based search. We observe that users usually operate PC in a common pattern: Operating some resources to finish a specific task related to a certain topic, and organizing these resources in some directories. This observation inspires us that topic, user behaviors and directory structure are quite useful information for locating resources.

In this thesis, we propose a personal information retrieval approach based on a unified multi-dimensional latent semantic graph. The approach exploits the three kinds of information to improve traditional desktop search. We denote the three implicit information as {Task, Topic, Location} Relations respectively. The heart of our approach is Latent Semantic Graph (LSG), which is used to measure the three relations with associated score. Based on LSG, we develop a personalized rank schema to improve tradition keyword-based desktop search and design a creative semantic recommendation algorithm to expand the query results. We implement the prototype system based on LSG and conduct user experiments. Experiments reveal that the performance of proposed approach is superior to that of traditional keyword-based desktop search and our approach is effective.

学科主题计算机软件 ; 软件理论
公开日期2015-06-24
源URL[http://ir.iscas.ac.cn/handle/311060/17111]  
专题软件研究所_互联网软件技术实验室 _学位论文
推荐引用方式
GB/T 7714
皇甫杨. 基于隐语义图谱的桌面搜索方法研究及应用[D]. 北京. 中国科学院研究生院. 2015.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。