中国科学院机构知识库网格系统: 基于图学习算法的几个问题研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于图学习算法的几个问题研究

文献类型：学位论文


作者	杨余久
学位类别	工学博士
答辩日期	2008-05-29
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	胡包钢
关键词	图论半监督学习拉普拉斯算子虚拟节点二部图 Graph Theory Semi-supervised learning Laplacian Dummy Nodes Bipartite Graph
其他题名	Study on Some Issues of Graph-based Learning Approaches
学位专业	模式识别与智能系统
中文摘要	对先验信息的表达和利用是提高机器学习方法性能的重要途径，而数据的空间结构是先验信息的重要表现形式之一。近年来，利用图来刻画数据间内在结构的方法受到研究人员大量的关注。一般而言，基于图学习算法要得益于其对局部结构的表达能力以及其与经典方法（如核方法）的紧密联系。当前，该类方法已广泛应用于不同任务中（如分类、聚类和降维）；其中，基于图的半监督学习方法具有能自然地将无监督学习和监督学习各自优点结合到一起的特点而成为半监督学习领域的研究热点之一。尽管文献中已给出大量基于图半监督学习算法，但其基本问题——图的构造问题，仍然没有得到很好的研究和足够的重视，许多实验已表明图构造问题对学习性能有明显的影响。在本文中，我们首先考虑图的构造问题；包括加权图的权重学习和多视角图的融合。其次，考虑对图建立合适的模型来进行直推学习，给出了一个新型的随机游走模型；同时，将图描述为正则化因子项，应用于非负矩阵分解算法中；最后，考虑二部图问题，提出了一个投影策略来降低现有问题规模，从而使得排序学习在大规模数据集上的应用得以实现，结合推荐系统的应用给出了二部图约简应用的具体过程。本文的主要贡献包括以下几点： 1). 图上参数学习和多视角图的融合进化策略：为找到加权图中合适的权重系数，文中应用往返时间来取代简单的欧式空间的距离度量，使得构造的图在学习算法中表现更为鲁棒；针对存在多视角图的关系问题，文中给出一个融合策略来充分利用这些先验信息；实验结果表明了该策略对学习性能有明显改善。 2) 直接在图上构建直推学习模型：文中建议通过引入虚拟节点的模式来克服以往随机游走模型的限制，提高了随机游走模型的通用性，并提出了用迭代的自学习机制来提高性能；另一方面，图约束关系可以描述成一个正则化框架下的因子项，文中提出将其应用于非负矩阵分解算法中，该模型在文档聚类任务中表现出较好的性能。 3) 对于具有二部图结构的大规模数据问题，文中讨论了通过投影的策略来缩小计算规模，考虑基于项目评分的推荐模型，给出了新的协同算法。实验验证了所给方案在标准数据集测试中有良好的表现。总的说来，本文讨论了通过构造图进行学习算法的相关模型、理论和改进，同已有方法的对比实验证实了所建议几个方案的优越性。
英文摘要	Using graph to describe underlying intrinsic data structure has been intensively studied in recent years. Generally speaking, graph-based approaches, which are benefited from their convenient local representation and connection with other models like kernel machines, have been applied to various tasks, such as classification, clustering and dimensionality reduction, and so forth. In particular, Semi-supervised learning has integrated the merits of unsupervised learning into supervised learning framework to increase the discriminative capability of the resulting learning machine. Despite there exist a number of graph-based semi-supervised learning algorithms, the fundamental problem of graph construction,which significantly influences performance in real-world experiments, is still an open problem. In this thesis, the graph construction problems, including a weighted graph learning and a multi-view graph fusion, were first considered. Secondly, a modified random walker model for inferring unlabeled data based on the learned graph was provided. As a factorization term under regularization theory, graph can be applied to Nonnegative Matrix Factorization algorithm. Finally, a compressing strategy, which makes a large scale data processing practical, was proposed for bipartite graph problem. The main contributions of this thesis are to highlight as following: 1) Parameters learning on graph and fusion strategy for multi-view graph. To find an appropriate weight for each edge of graph, we replaced Euclidean distance measure in Gaussian kernel function by commute time defined in graph theory, the resulting graph has a more robust performance than the original weight. For the multi-view graph, we provided a unified framework for integrating domain prior knowledge and geometrical structure information. Experiments shown that the performance of these strategies has significantly improved. 2) Implement transductive learning on graph directly. We presented a novel random walk model with dummy nodes to overcome the limits of the classical random walker algorithm. The proposed approach has expanded the application of random walk model, one can use boosting idea to update the original graph. On the other hand, graph can be represented as a penalty term under regularization theory, we considered non-negative matrix factorization using graph constraints and applied the presented method to document clustering. The empirical results shown better performance than one without graph constraints. 3) As for bipartite graph model, we investigated a shrinking strategy via projection approach, and conducted an item-based recommend system using projected graph. The proposed algorithm also alleviates the burden of computation for a large scale case significantly. Some experiments on the Benchmark dataset have shown significantly improved the performance of recommender system compared with baseline method.
语种	中文
其他标识符	200218014603230
源URL	[http://ir.ia.ac.cn/handle/173211/6093]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	杨余久. 基于图学习算法的几个问题研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2008.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。