中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
中文人物评论意见挖掘

文献类型:学位论文

作者李娟
学位类别博士
答辩日期2009-05-27
授予单位中国科学院声学研究所
授予地点声学研究所
关键词意见挖掘 观点抽取 倾向分析 句类分析 基于模板
其他题名Opinion Mining on Figures Comments in Chinese
学位专业信号与信息处理
中文摘要意见挖掘是近几年来自然语言理解领域中发展起来的一个新课题,也是当前的一个研究热点。它所研究的就是如何从主观性文本中自动提取出评论性信息(即意见或观点)。意见挖掘在电子商务、舆情监测等社会生活中有重要的意义,具有较高的研究价值。 本文在词语和语句两个层次上对中文人物评论意见挖掘进行了研究。研究的目的是:自动提取出人物评论语句中的意见信息。对于词语,利用词典和统计方法相结合实现了极性词语的识别和极性判定。对于语句,分别使用了基于模板的意见挖掘方法和基于句类分析的意见挖掘方法,实现了相应的系统。本文的主要研究内容如下: 1.使用基于极性词典、同义词词典和二元语法相结合的方法实现中文词语倾向性识别。该方法使用极性词典来判定单倾向性词语的情感倾向,使用同义词典结合二元语法来判定多倾向性词语的情感倾向。能够有效地判定中文词语的情感倾向,准确率达到81%以上。 2.实现了基于模板的语句意见挖掘系统。将基于模板的方法应用到语句意见挖掘中,从训练语料中提取和生成意见模板,再使用该模板来实现意见元素的抽取。准确率达到75.3%。 3.实现了基于句类分析的语句意见挖掘系统。总结句类倾向性规律,形成句类倾向性规则库,用句类倾向性规则实现意见元素的初步定位,然后使用模板方法提取出意见元素。准确率达到86.57%。 4.建立了适用于人物评论意见挖掘的相关资源,如极性词典、同义词词典,人物属性词表等。其中极性词典(6572条)由本文对已有的若干褒贬义词典(7167条)和知网情感分析用词语集(6846条)进行汇集、校对并筛选出适用于人物评论的词语而构成。同义词典、属性词表由本文搜集整理而构成。 通过上述工作,本文实现了对中文人物评价语句的意见提取系统,建立了相关资源。在本文研究结果的基础上可以开发面向人物挖掘的各种具体应用,例如网络舆情监测、政治选举中候选人民意监测系统等,提供对人物的宏观褒贬评价等信息,也可以在本文基础上进一步研究篇章级的意见挖掘。
英文摘要Opinion Mining is a new topic in Natural Language Processing, it is also a hotspot problem in recent years. The target of Opinion Mining is to extract evaluation information (called opinion) from subjective text automatically. Opinion Mining may have great influence in the Electronic Business, Public Opinion survey and other social life, thus it is a valuable researching field. This dissertation carried out the research work in two layers: word and sentence. The aim of this work is: to extract opinion information from figure comment sentences automatically. As to word, we use dictionary and statistical method to recognize the words and to determine their orientation. And as to sentence, we adopt two methods to mine the opinion from the sentences, one is template-based method, and the other is the sentence-category-analysis-based method. The main work of this dissertation is shown below: 1. To implement a system for the orientation determination of Chinese words based on polarity dictionary, synonyms dictionary and bi-gram. The method uses polarity dictionary to determine the orientation for those single- orientation words, and uses synonyms dictionary combining with bi-gram to determine those multiple-orientation words. The system gets a precision of more than 81%. 2. To implement a sentence opinion mining system using template-based method. This method extracts opinion templates from the training corpus, and then uses these templates to extract the opinion elements. The method gets a precision of 75.3%. 3. To implement a sentence opinion mining system based on sentence category analysis. We summarize the orientation rules for sentence categories, which find out the semantic chunks containing the opinion elements firstly and then locate the opinion elements by templates. The method gets a precision of 86.57%. 4. To establish resource for opinion mining which is suitable to figure evaluating, including polarity dictionary, synonyms dictionary, people-feature collections, etc. We eatablished the polarity dictionary((6572 items) by converging some existing dictionaries(7167 items) and HowNet polar words collection(6846 items), then selecting those applies to people. We also collected the synonyms dictionary and people-feature collections. The result of this dissertation can be applied to online public opinion tracking system by providing macroscopic orientation of the comments on some figures. It also lays a foundation for other opinion extraction applications. Further research about article opinion mining can be carried out based on this work.
语种中文
公开日期2011-05-07
页码67
源URL[http://159.226.59.140/handle/311008/550]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
李娟. 中文人物评论意见挖掘[D]. 声学研究所. 中国科学院声学研究所. 2009.

入库方式: OAI收割

来源:声学研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。