中国科学院机构知识库网格系统: query expansion based on a semantic graph model

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

query expansion based on a semantic graph model

文献类型：会议论文


作者	Jiang Xue
出版日期	2011
会议地点	Beijing, China
关键词	query expansion semantic network word graph
页码	1315-1316
中文摘要	Query expansion is a classical topic in the field of information retrieval, which is proposed to bridge the gap between searchers' information intents and their queries. Previous researches usually expand queries based on document collections, or some external resources such as WordNet and Wikipedia [1, 2, 3, 4, 5]. However, it seems that independently using one of these resources has some defects, document collections lack semantic information of words, while WordNet and Wikipedia may not include domain-specific knowledge in certain document collection. Our work aims to combine these two kinds of resources to establish an expansion model which represents not only domain-specific information but also semantic information. In our preliminary experiments, we construct a two-layer word graph and use Random-Walk algorithm to calculate the weights of each term in pseudo-relevance feedback documents, then select the highest weighted term to expand original query. The first layer of the word graph contains terms in related documents, while the second layer contains semantic senses corresponding to these terms. These terms and semantic senses are treated as vertices of the graph and connected with each other by all possible relationships, such as mutual information and semantic similarities. We utilized mutual information, semantic similarity and uniform distribution as the weight of term-term relation, sense-sense relation and word-sense relation respectively. Though these experiments show that our expansion outperform original queries, we are troubled with some difficult problems. Given the framework of semantic graph model, we need more effort to find out an optimal graph to represent the relationships between terms and their semantic senses. We utilized a two-layer graph model in our preliminary research, where terms from different documents are treated equally. Maybe we can introduce the document as a third layer in future work, where we can differ the same terms in different documents according to document relevance and context. Then we need appropriately represent initial weights of this words, senses and relationships. Various measures for weights of terms and term relations have been proved effective in other information retrieval tasks, such as TFIDF, mutual information MI, but there is little research on weights for semantic senses and their relations. For polysemous words, we add all of their semantic senses to the graph and assume that these senses are uniformly distributed. Actually, it is not precise for a word in a special document and query. As we know, a polysemous word may have only one or two senses in a document, and they are not uniformly distributed. Give a word, what we should do is to determine its word senses in a relevant document and estimate the distribution of these senses. Word sense disambiguation may help us in this problem. Then, there are many methods to compute word similarity according to WordNet, which we use to represent the weights of relationships between word senses. Varelas et al implemented some popular methods to compute semantic similarity by mapping terms to an ontology and examining their relationships in that ontology [4]. We also need to know which algorithm for semantic similarity is most suitable for our model. Additional, WordNet is suitable to calculate word similarity but not suitable to measure word relevance. The inner hyperlinks of Wikipedia could help us to calculate word relevance. We wish to find an effective way to combine the similarity measure from WordNet and relevance measure from Wikipedia, which may completely reflect word relationships.
英文摘要	Query expansion is a classical topic in the field of information retrieval, which is proposed to bridge the gap between searchers' information intents and their queries. Previous researches usually expand queries based on document collections, or some external resources such as WordNet and Wikipedia [1, 2, 3, 4, 5]. However, it seems that independently using one of these resources has some defects, document collections lack semantic information of words, while WordNet and Wikipedia may not include domain-specific knowledge in certain document collection. Our work aims to combine these two kinds of resources to establish an expansion model which represents not only domain-specific information but also semantic information. In our preliminary experiments, we construct a two-layer word graph and use Random-Walk algorithm to calculate the weights of each term in pseudo-relevance feedback documents, then select the highest weighted term to expand original query. The first layer of the word graph contains terms in related documents, while the second layer contains semantic senses corresponding to these terms. These terms and semantic senses are treated as vertices of the graph and connected with each other by all possible relationships, such as mutual information and semantic similarities. We utilized mutual information, semantic similarity and uniform distribution as the weight of term-term relation, sense-sense relation and word-sense relation respectively. Though these experiments show that our expansion outperform original queries, we are troubled with some difficult problems. Given the framework of semantic graph model, we need more effort to find out an optimal graph to represent the relationships between terms and their semantic senses. We utilized a two-layer graph model in our preliminary research, where terms from different documents are treated equally. Maybe we can introduce the document as a third layer in future work, where we can differ the same terms in different documents according to document relevance and context. Then we need appropriately represent initial weights of this words, senses and relationships. Various measures for weights of terms and term relations have been proved effective in other information retrieval tasks, such as TFIDF, mutual information MI, but there is little research on weights for semantic senses and their relations. For polysemous words, we add all of their semantic senses to the graph and assume that these senses are uniformly distributed. Actually, it is not precise for a word in a special document and query. As we know, a polysemous word may have only one or two senses in a document, and they are not uniformly distributed. Give a word, what we should do is to determine its word senses in a relevant document and estimate the distribution of these senses. Word sense disambiguation may help us in this problem. Then, there are many methods to compute word similarity according to WordNet, which we use to represent the weights of relationships between word senses. Varelas et al implemented some popular methods to compute semantic similarity by mapping terms to an ontology and examining their relationships in that ontology [4]. We also need to know which algorithm for semantic similarity is most suitable for our model. Additional, WordNet is suitable to calculate word similarity but not suitable to measure word relevance. The inner hyperlinks of Wikipedia could help us to calculate word relevance. We wish to find an effective way to combine the similarity measure from WordNet and relevance measure from Wikipedia, which may completely reflect word relationships.
收录类别	ACM
会议录	Proceedings of the 34th international ACM SIGIR conference on Research and development in Information
ISBN号	978-1-4503-0757-4
源URL	[http://ir.iscas.ac.cn/handle/311060/16192]
专题	软件研究所_软件所图书馆_会议论文
推荐引用方式 GB/T 7714	Jiang Xue. query expansion based on a semantic graph model[C]. 见:. Beijing, China.

入库方式： OAI收割

来源：软件研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。