中文文本情感语义信息处理研究
文献类型:学位论文
作者 | 陈琳![]() |
学位类别 | 工学博士 |
答辩日期 | 2010-06-03 |
授予单位 | 中国科学院研究生院 |
授予地点 | 中国科学院自动化研究所 |
导师 | 杨一平 ; 曾隽芳 |
关键词 | 七情 情感表述单元 概念知识树知识表达 模糊数学 语义信息处理 Seven Basic Sentiments Sentiment Cell Concept Knowledge Tree Fuzzy Mathematics Semantic Information Processing |
其他题名 | Research on Semantic Information Processing of Chinese Text Sentiment |
学位专业 | 计算机应用技术 |
中文摘要 | 中文文本情感语义信息处理是指对中文文本中的情感要素进行语义分析计算,对文本描述的情感作出类属程度判定的研究课题。利用情感信息处理技术在海量文本中找到特定情感信息已逐渐成为当前自然语言理解领域较活跃而又极具挑战的课题。它广泛应用在文本检索、文本过滤等技术领域,在商业产品质量评论、社会舆情分析、信息监控等方面的实际应用也受到越来越多的重视。 针对目前文本情感信息处理中情感类别单薄、语义分析欠缺的问题,本文借助于语义信息处理、计算语言学、模糊数学、机器学习等领域的理论与方法,从分析情感形式化描述和文本情感特征到情感语义要素的映射入手,在概念知识树知识表示体系基础上,从情感机理上,探索中文文本情感信息处理的新技术和新方法,建立中文文本情感信息处理语义模型,完成文本情感类别判定和定量程度计算。本文主要研究工作和创新点包括: [1] 情感形式化语义表述和计算模型的构建 选取“喜”“怒”“哀”“惧”“爱”“恶”“欲”七情作为本文基本情感体系,采用七维情感向量形式化描述文本情感;同时分析七情关系,基于实际语料计算七情影响矩阵,并以此为基础提出了情感向量正交化方法,采用向量运算支撑情感量化计算。 [2] 中文文本情感知识体系的构建及语义特征分析基础上的基于情感表述单元(SC,Sentiment Cell)的处理模型的提出 在考察大量实际语料基础上,从语义信息处理角度,将中文文本情感语义特征分为三类:影响情感类属的“质”特征、影响情感描述程度的“量”特征和文本情感描述对象,并分析它们对情感信息处理的影响,构建了包含8709条情感“质”概念及关系知识树,包含否定副词、程度副词、关系连词等情感“量”概念及情感对象语义分类树的知识体系;在此基础上分析特征之间关系,进行要素语义复合,构造用于文本情感分析的情感表述单元SC,中文文本的情感分析都可归结为基本SC单元及SC单元组合处理。 [3] 文本情感计算SC单元内和单元组合算法的设计与实现 分析SC结构内要素关系,提出基于互信息的“质”情感向量计算方法,并引入模糊因子定量模拟“量”概念对情感的影响,制定运算规则,完成单元内情感计算,实验验证了算法的有效性;分析影响SC组合的8类连词因素和单元对象语义相关性因素,给出相应的SC组合计算方法,并以此为基础设计了综合SC组合算法,实验验证了算法的有效性。 |
英文摘要 | Semantic information processing of Chinese text sentiment means automatically analyzing the sentiment factors in the text, and classifying the text as happy or sad and etc. With the explosive information on the internet, it has been an active but challenging area in the modern natural language understanding on how to obtain particular sentiment information from a large amount of text data. It can be widely used in text retrieval, text filtering, and other technical fields, and it also draws more and more attention from various applications such as products quality appraisal, public opinion analysis and monitor, and etc. It shown that current research on sentiment analysis is too simple and limited and lack of semantic analysis. With the aid of the theories and methods in semantic information processing, fuzzy mathematics, machine learning areas, and with the concept knowledge tree as our knowledge representing model, the semantic analysis, formal presentation and calculable modeling, mapping and calculating of text sentiment is studied to develop new technologies and methods for text sentiment processing. The main work and innovative contributions include: [1]Construction of semantic formalization and calculable model of sentiment A sentiment formalization and calculable model is put forward. "Happy", "anger", "sad", "fear", "love", "hate" and "desire" are chosen to be the seven basic sentiment categories, and every sentiment can be expressed by these basic sentiments and formally presented by a seven-dimension-vector. We analyse and calculate the relational matrix of these senven basic sentiment categories based on corpus, and then orthogonal transform the vector in order to support the vector operation of sentiment computing. [2]Construction of knowledge system on Chinese text sentiment, and proposed processing model based on sentiment cell (SC) Based on corpuses, we divide text sentiment features into three classes from the view of semantic information processing: the core features, the degree relative features and the sentiment objects. Based on detailed analysis about their influences on sentiment representation, we build a knowledge system includes 8709 core concepts and their relationship knowledge tree, degree relative concepts including negative adverbs, degree adverbs, conjunctions and semantic categorization knowledge tree of objects.Then we analyse the relationship of these features in text combine relevant semantic ... |
语种 | 中文 |
其他标识符 | 200618014629095 |
源URL | [http://ir.ia.ac.cn/handle/173211/6289] ![]() |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 陈琳. 中文文本情感语义信息处理研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2010. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。