中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Rough Set理论及其在机器学习中的应用研究

文献类型:学位论文

作者苗夺谦
学位类别工学博士
答辩日期1997-06-01
授予单位中国科学院自动化研究所
授予地点中国科学院自动化研究所
导师戴汝为 ; 王珏
关键词Rough Set理论 知识表示 知识粒度 知识约简 计算复杂 算法完备性 聚类分析 机器学习 决策树方法 rough set theory knowledge representation granularity of knowledge reduction of knowledge computational complexity completeness
学位专业模式识别与智能系统
中文摘要Pawlak提出的Rough Set(RS)理论是处理知识,特别是不精确、不相容知识的一种新 的数学工具。该理论对知识给出了形式化定义,使得对知识能够进行有效的分析和操作. RS理论还提供了一套从数据中自动获取知识的工具,即知识约简。目前,RS理论正在 被广泛应用于人工智能、模式识别等很多领域。 本文的主要工作分为两部分:一部分是关于RS的理论研究;一部分是关于RS在 AI中的应用研究。 ·RS的理论研究主要包括: (1).本文首先将RS意义下的知识看作是关于论域的σ-代数上的随机变量,定义了知 识信息熵的概念;然后,证明了信息熵与互信息都是随着知识粗糙性的增加而单调下降 的。从而为知识粗糙性提供了一种信息解释。 (2).知识表示是人工智能研究的基本问题之一,它与计算复杂性有着密切的关系。为 了得到优化的知识约简算法,本文对RS的主要概念和运算从信息的角度进行了新的表 示,即信息表示。然后,证明了在一定条件下,信息表示与RS的代数表示是等价的. 信息表示是本文给出的带策略的知识约简算法的理论基础。 (3).针对RS理论中,求取知识的最小约简是一个NP-hard问题.本文以信息表示为 基础,对信息系统和决策表分别给出了带策略的知识约简算法;并指出其计算复杂性是 多项式的。通过实验说明,在绝大多数情况下,它们可以得到最小约简。 (4).对目前已有的几种知识约简算法进行了分析,分析的重点是这些算法对最小约简 的完备性。通过理论上构造的一个反例说明,现有算法对最小约简都是不完备的. (5).针对RS理论只能处理离散属性的局限性,本文以决策表的相容度作为反馈信 息,提出了一种领域独立的基于动态层次聚类的自动划分连续属性的方法。本方法拓广 了RS理论的应用范围。通过实例将这一方法与现有方法进行了对比分析,得到了令人 鼓舞的结果。 ·RS理论的应用研究: (6).决策树学习方法是机器学习中的重要方法之一。本文将RS理论中属性相对核的 概念用于解决检验中变量的选择问题;通过定义等价关系的相对泛化,解决了多变量检 验的构造问题。通过实例,将本方法与单变量及其它多变量方法进行了对比分析。结果 表明将RS理论用于多变量决策树构造是有效的。
英文摘要Rough Set(RS) theory, introduced by Z.Pawlak, is a new mathematical tool to deal with knowledge, particularly when knowledge is imprecise or inconsistent. The RS theory gives a formal definition of knowledge so that the knowledge can be analyzed and manipulated effectively. This theory also provides a suit of tools, i.e. reduction of knowledge, to acquire knowledge from data automatically. Recently, the RS theory is widely being used in many areas, such as artificial intelligence, pattern recognition etc. The main work reported in this thesis can be divided into two parts: one about the theoretical study of RS; the other about the study of application of RS in AI. · The theoretical study of RS includes: (1) In this thesis, the knowledge in RS context can be understand as random variable defined on the o-algebra about universe, and the information entropy of knowledge is defined. Then, it is proved that the information entropy and mutual information of knowledge are monotone decreasing with the increment of roughness of knowledge. These conclusions provide a kind of information explanation for roughness of knowledge. (2) Knowledge representation is one of basic problems in M studies. It has close ties with computing complexity. In order to find optimal algorithm for reduction of knowledge, a new kind of representation of main concepts and operations in RS, called information representation, is proposed in this thesis from the view of information. Then, the equivalence between information and algebraic representation is proved under certain conditions. This representation is the theoretical foundation of the algorithms with strategy presented in this thesis. (3) As for finding minimal reduct is a NP-hard problem in RS. Based on information representation, the algorithms with strategy for reduction of knowledge are proposed for information system and decision table, respectively. Their complexities are polynomial. By examples, we show that the minimal reduct can be find by these algorithms in most cases.. (4) Several existing algorithms of reduction are analyzed in this thesis. The emphasis of our analysis puts on the completeness of these algorithms with respect to minimal reduct. Through a counterexample constructed by ourselves theoretically, we illustrate that all of existing algorithms are incomplete. (5) To overcome the limitation that RS only dealing with discrete attribute, an area- independent automatic approach for discretization of continuous attribute based on dynamic cluster algorithm is presented in this thesis. This approach widens the scope of application of RS. The comparisons between this approach and existing approaches are made, and the results are very encouraging. · The study of application of RS in AI (6) Decision tree is one of important learning approaches in machine learning. In this thesis, the problem of attribute selection in multivariate tests is solved
语种中文
其他标识符408
源URL[http://ir.ia.ac.cn/handle/173211/5672]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
苗夺谦. Rough Set理论及其在机器学习中的应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所. 1997.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。