中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
高维数据分析中的密度聚类算法的研究

文献类型:学位论文

作者张涛; 王学微
学位类别硕士
答辩日期2018-05-17
授予单位中国科学院沈阳自动化研究所
授予地点沈阳
导师刘昶
关键词聚类 高维数据 密度 核心点 自适应
其他题名Research on Density Clustering Algorithm in High-dimensional Data Analysis
学位专业控制工程
中文摘要针对目前聚类算法不能有效的处理模糊边界点的问题,提出了一种基于真实核心点的RDBSCAN (Real-density-Based Spatial Clustering of Applications with Noise)聚类算法。提出真实核心点的概念,首先在密度聚类过程中的核心点进一步处理分类,把影响聚类效果的伪核心点剔除,将剩下的真实核心点根据密度可达原则进行聚类;然后提出密度合并判定定理:相同类簇内点的真实密度远大于不同类簇的点,以此为指导判断真实核心点的真实密度,使类簇内各点的相似性更大。通过人工数据集与UCI数据集聚类实验看出,RDBSCAN算法降低了模糊边界点的干扰,而且出现了若干新颖的类簇分类,在密度不规则的数据集中聚类更加准确。针对目前聚类算法参数过多、相互干涉的问题,提出了一种无参数的密度聚类算法ACBD (automatic clustering based on density)。首先设计了一种适用于高维数据的无参数的密度计算方式,根据数据集的规模与特点计算所有点到其最近点的距离的平均值,以该值为参数来科学有效的计算每个点的密度,很好地诠释了高维数据的密度情况;其次给出一种新的自适应邻域定义,根据数据自动确定邻域半径;最后提出邻域搜索聚类方法:从决策图中选择若干密度中心,依次以密度中心为起点进行邻域内核心点搜索,直到邻域内没有核心点。通过人工数据集与UCI数据集聚类实验看出,ACBD算法无需人工设置和测试参数且聚类准确率较高,最终在手写数字识别和人脸识别等高维数据中也有很高的聚类准确率,不失为一种有效简单使用的聚类算法。
英文摘要Aiming at the problem that the current clustering algorithm cannot effectively deal with fuzzy boundary points, an RDBSCAN (Real-density-Based Spatial Clustering of Applications with Noise) clustering algorithm based on real core points is proposed. The concept of a real core point is proposed. First, the core points in the process of density clustering are further processed for classification, the pseudo core points affecting the clustering effect are eliminated, and the remaining real core points are clustered according to the density reachability principle; Density Merging Decision Theorem: The true density of the points within the same cluster is much larger than that of different clusters, and the true density of the real core points is judged by this guidance, and the similarities of the points in the cluster are greater. According to the clustering experiments with experimental dataset and UCI dataset, the RDBSCAN algorithm reduces the interference of fuzzy boundary points, and several novel cluster classifications emerge. Clustering is more accurate in density-independent datasets. To solve the problem of too many parameters and mutual interference among clustering algorithms, a density-based parameter-free clustering ACBD algorithm (automatic clustering based on density) is proposed. A parameter-free density calculation method is proposed, the average value of the distance from all points to its nearest point is calculated according to the scale and characteristics of the data set. The value is used as a parameter to scientifically and efficiently calculate the density of each point; the new adaptive neighborhood definition automatically determines the radius of the neighborhood according to the data. Finally, a neighborhood search clustering method is proposed: several density centers are selected from the decision graph, and the core points in the neighborhood are searched in order from the density center until the neighborhood is all searched with no core point left. Through artificial data sets and UCI data set clustering experiments, it can be seen that ACBD algorithm does not require manual setting and testing of parameters and the clustering accuracy is outstanding. It also has high clustering accuracy in handwritten digit recognition and face recognition. Above all, ACBD can be regarded as a kind of effective simple and practical clustering algorithm.
语种中文
产权排序1
页码73页
源URL[http://ir.sia.cn/handle/173321/21768]  
专题沈阳自动化研究所_其他
推荐引用方式
GB/T 7714
张涛,王学微. 高维数据分析中的密度聚类算法的研究[D]. 沈阳. 中国科学院沈阳自动化研究所. 2018.

入库方式: OAI收割

来源:沈阳自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。