中国科学院机构知识库网格系统: Rough Set理论及其在机器学习中的应用研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Rough Set理论及其在机器学习中的应用研究

文献类型：学位论文


作者	苗夺谦
学位类别	工学博士
答辩日期	1997-06-01
授予单位	中国科学院自动化研究所
授予地点	中国科学院自动化研究所
导师	戴汝为 ; 王珏
关键词	Rough Set理论知识表示知识粒度知识约简计算复杂算法完备性聚类分析机器学习决策树方法 rough set theory knowledge representation granularity of knowledge reduction of knowledge computational complexity completeness
学位专业	模式识别与智能系统
中文摘要	Pawlak提出的Rough Set(RS)理论是处理知识，特别是不精确、不相容知识的一种新的数学工具。该理论对知识给出了形式化定义，使得对知识能够进行有效的分析和操作． RS理论还提供了一套从数据中自动获取知识的工具，即知识约简。目前，RS理论正在被广泛应用于人工智能、模式识别等很多领域。本文的主要工作分为两部分：一部分是关于RS的理论研究；一部分是关于RS在 AI中的应用研究。 ·RS的理论研究主要包括： (1)．本文首先将RS意义下的知识看作是关于论域的σ-代数上的随机变量，定义了知识信息熵的概念；然后，证明了信息熵与互信息都是随着知识粗糙性的增加而单调下降的。从而为知识粗糙性提供了一种信息解释。 (2)．知识表示是人工智能研究的基本问题之一，它与计算复杂性有着密切的关系。为了得到优化的知识约简算法，本文对RS的主要概念和运算从信息的角度进行了新的表示，即信息表示。然后，证明了在一定条件下，信息表示与RS的代数表示是等价的．信息表示是本文给出的带策略的知识约简算法的理论基础。 (3)．针对RS理论中，求取知识的最小约简是一个NP-hard问题．本文以信息表示为基础，对信息系统和决策表分别给出了带策略的知识约简算法；并指出其计算复杂性是多项式的。通过实验说明，在绝大多数情况下，它们可以得到最小约简。 (4)．对目前已有的几种知识约简算法进行了分析，分析的重点是这些算法对最小约简的完备性。通过理论上构造的一个反例说明，现有算法对最小约简都是不完备的． (5)．针对RS理论只能处理离散属性的局限性，本文以决策表的相容度作为反馈信息，提出了一种领域独立的基于动态层次聚类的自动划分连续属性的方法。本方法拓广了RS理论的应用范围。通过实例将这一方法与现有方法进行了对比分析，得到了令人鼓舞的结果。 ·RS理论的应用研究： (6)．决策树学习方法是机器学习中的重要方法之一。本文将RS理论中属性相对核的概念用于解决检验中变量的选择问题；通过定义等价关系的相对泛化，解决了多变量检验的构造问题。通过实例，将本方法与单变量及其它多变量方法进行了对比分析。结果表明将RS理论用于多变量决策树构造是有效的。
英文摘要	Rough Set(RS) theory, introduced by Z.Pawlak, is a new mathematical tool to deal with knowledge, particularly when knowledge is imprecise or inconsistent. The RS theory gives a formal definition of knowledge so that the knowledge can be analyzed and manipulated effectively. This theory also provides a suit of tools, i.e. reduction of knowledge, to acquire knowledge from data automatically. Recently, the RS theory is widely being used in many areas, such as artificial intelligence, pattern recognition etc. The main work reported in this thesis can be divided into two parts: one about the theoretical study of RS; the other about the study of application of RS in AI. · The theoretical study of RS includes: (1) In this thesis, the knowledge in RS context can be understand as random variable defined on the o-algebra about universe, and the information entropy of knowledge is defined. Then, it is proved that the information entropy and mutual information of knowledge are monotone decreasing with the increment of roughness of knowledge. These conclusions provide a kind of information explanation for roughness of knowledge. (2) Knowledge representation is one of basic problems in M studies. It has close ties with computing complexity. In order to find optimal algorithm for reduction of knowledge, a new kind of representation of main concepts and operations in RS, called information representation, is proposed in this thesis from the view of information. Then, the equivalence between information and algebraic representation is proved under certain conditions. This representation is the theoretical foundation of the algorithms with strategy presented in this thesis. (3) As for finding minimal reduct is a NP-hard problem in RS. Based on information representation, the algorithms with strategy for reduction of knowledge are proposed for information system and decision table, respectively. Their complexities are polynomial. By examples, we show that the minimal reduct can be find by these algorithms in most cases.. (4) Several existing algorithms of reduction are analyzed in this thesis. The emphasis of our analysis puts on the completeness of these algorithms with respect to minimal reduct. Through a counterexample constructed by ourselves theoretically, we illustrate that all of existing algorithms are incomplete. (5) To overcome the limitation that RS only dealing with discrete attribute, an area- independent automatic approach for discretization of continuous attribute based on dynamic cluster algorithm is presented in this thesis. This approach widens the scope of application of RS. The comparisons between this approach and existing approaches are made, and the results are very encouraging. · The study of application of RS in AI (6) Decision tree is one of important learning approaches in machine learning. In this thesis, the problem of attribute selection in multivariate tests is solved
语种	中文
其他标识符	408
源URL	[http://ir.ia.ac.cn/handle/173211/5672]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	苗夺谦. Rough Set理论及其在机器学习中的应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所. 1997.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。