基于情感模型的观点挖掘方法研究
文献类型:学位论文
作者 | 皇甫璐雯 |
学位类别 | 工学硕士 |
答辩日期 | 2013-12-01 |
授予单位 | 中国科学院大学 |
授予地点 | 中国科学院自动化研究所 |
导师 | 毛文吉 |
关键词 | 情感类型挖掘 OCC 模型 情感维度 Bootstrapping emotion extraction OCC model emotion dimension Bootstrapping |
其他题名 | Study on an Emotion Model-Based Opinion Mining Method |
学位专业 | 模式识别与智能系统 |
中文摘要 | 随着社会媒体的快速发展,网络中涌现大量带有个人情感的主观性言论。挖掘出这些言论中隐含的观点和情感,对社会公共安全、商务智能和社情舆情等领域的应用至关重要。观点挖掘(或情感分析)成为面向网络社会媒体的分析挖掘领域的一个核心研究课题。目前观点挖掘(或情感分析)研究工作主要包括观点倾向性挖掘和情感类型挖掘。然而,传统的观点倾向性挖掘方法主要关注观点的倾向性而忽略了其丰富的情感类型;已有的情感类型挖掘尽管能够输出丰富的情感类型,但是需要大量的标注数据支持。此外,以往工作几乎都未考虑情感认知理论模型在观点挖掘和情感识别中的重要作用。因此,为了更好地实现从网上评论中挖掘出丰富的情感类型,本论文提出基于情感模型自动进行文本情感类型挖掘的方法。 论文研究工作的主要贡献如下: 基于认知心理学领域发展成熟的情感认知结构模型OCC,设计并实现了一种基于OCC情感模型的观点挖掘方法。该方法首先采用统计方法,利用通用语义词典、句法依存关系及少量标注数据,自动构建情感维度词典;其次,对所构建的情感维度词典进行求精,通过语义、情感倾向的不一致性处理和非情感词的过滤,获取高质量的情感维度词典;最后,基于所得到的情感维度词典,结合OCC模型中情感维度值与情感类型的对应关系,生成六种主要的情感类型:高兴(Joy)、悲伤(Distress)、希望(Hope)、恐惧(Fear)、骄傲(Pride)、羞耻(Shame)。 改进了基于OCC情感模型的观点挖掘方法,设计并实现了一种基于OCC模型、融合Bootstrapping技术的观点挖掘算法。该算法利用Bootstrapping思想,通过促进候选的情感维度词和依存关系模板间的相互学习,逐步提高情感维度词典的质量。为获得高质量的模板集,该算法基于情感维度词集,利用相关性和可区分性两个指标来评价候选的依存关系模板;为获得高质量的情感维度词集,该算法基于模板集,利用可靠性和极性两个指标来评价候选的情感维度词。 在上述工作的基础上,基于真实的网上新闻评论数据,采用实验方法初步验证了论文提出的上述方法的有效性,并实现了基于OCC情感模型的观点挖掘系统。该系统与相关工作相比,可以有效减少人工标注,同时对于情感类型的输出具有更好的可解释性。由于我们的工作基于经典的情感认知结构模型,这不但给文本情感分析赋予了更深层次的认知结构关联,而且为情感类型的输出提供了一个建立在认知心理学模型基础上的更加精细的解释。 综上,论文工作在使用灵活性、可解释性和有效性上具有明显的优势,同时具有较好的研究意义和应用价值。 |
英文摘要 | Along with the rapid development of social media, massive subjective remarks, which contain individual emotions, accumulate on the Web. Mining the implicit opinions as well as emotions from these textual remarks is crucial to enormous practical applications, such as public safety systems, business intelligent services and social monitoring and managements on the mass. As a result, opinion mining and sentiment analysis has become one of the central research topics of network-oriented social media analysis and mining domain. Currently, opinion mining and sentiment analysis research work can be categorized into mainly two branches: orientation extraction and emotion extraction. However, traditional approaches to orientation extraction have mainly focused on mining the polarities of opinions rather than the colorful emotion types; traditional approaches to emotion extraction requires great amount of annotated data even though they can output emotion types. Moreover, emotion theories, which identify the underlying cognitive structure and emotional dimensions that are key to generate emotions, have almost been totally ignored in previous work. Therefore, to facilitate the automatic extraction of emotions from textual data, in this thesis, we propose an emotion model based opinion mining method to automatically extract emotion types from text. The main contributions of our work are as follows. Informed by the mature cognitive structure of emotion model (abbreviated as OCC), this work has designed and implemented an OCC model based opinion mining method for extracting emotion types from text. To begin with, we first employ a statistical method to construct the emotion-dictionary based on the candidate sets collected by general semantic dictionary and several syntactic templates we design and a small amount of annotated data. We then refine the constructed emotion-dimension dictionary by filtering out emotional words which have conflicting semantics or orientations as well as non-emotional words. As the emotion-dimension dictionary is readily prepared, we can utilize OCC emotion model as a variety of mapping rules between emotional dimensions and emotion types, to generate the corresponding six basic emotion types in the text, that is, Joy, Distress, Hope, Fear, Pride, Shame. This thesis has improved the OCC model based opinion mining method by designing and implementing an OCC model based opinion mining algorithm which integrates Bootstrapping ... |
语种 | 中文 |
其他标识符 | 201028014628040 |
源URL | [http://ir.ia.ac.cn/handle/173211/7738] ![]() |
专题 | 毕业生_硕士学位论文 |
推荐引用方式 GB/T 7714 | 皇甫璐雯. 基于情感模型的观点挖掘方法研究[D]. 中国科学院自动化研究所. 中国科学院大学. 2013. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。