中国科学院机构知识库网格系统: 基于机器学习的生物医学数据处理方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于机器学习的生物医学数据处理方法研究

文献类型：学位论文


作者	杨秀锋
学位类别	硕士
答辩日期	2014-05-28
授予单位	中国科学院沈阳自动化研究所
导师	彭慧
关键词	机器学习生物医学数据分析核函数降维监督学习
其他题名	Machine Learning Approaches to Biomedical Data Analysis
学位专业	控制工程
中文摘要	生物医学信息学的发展十分依赖其相关领域的发展。随着信息技术的快速发展，人们开始集中于考虑如何将先进的信息技术应用到生物医学信息的研究领域当中。目前，机器学习技术已经成为了数学信息学和计算机科学中的研究热点，而且也已经被成功地应用到了很多研究领域当中。本文的主要研究是发展合适的机器学习技术并将其应用到生物医学信息数据的分析当中。数据降维技术是机器学习中的一个很重要的方面，其中流形学习已经得到了极大的关注。基于局部线性和全局非线性的假设，流形学习算法可以保持非线性数据的本质结构。然而，当考虑到分类任务时，传统的流形学习算法会面临很多的缺点：例如无监督问题、样本大小问题、样本外点问题和易受噪声影响的问题。自从Vapnik 在1995年提出了基于统计学习理论和核戏法的支持向量机算法，核方法的研究已经成为了机器学习中的一个热点。同时，支持向量机也被广泛的应用于图像处理、生物医学数据分析和文本分类当中。本文的研究主要集中于设计合适的支持向量机的核函数和解决流形学习等距特征映射算法中的无监督问题，已经如何将机器学习技术应用到生物医学数据分析任务中。 1. 基因表达数据分类与可视化需要解决高维的问题。传统的等距特征映算法不能应用于多个类簇的数据，降维后不能够产生从高维到低维的映射矩阵。文中利用近邻元分析方法取代多维尺度分析法，并且引入特征向量作为输入矩阵，提出一种以基因表达数据分类为目的的等距特征映射算法（NC-ISOMAP）。降维时获取理想的低维投影矩阵，降维后类间数据更加分开，类内数据更加紧凑。实验结果表明NC-ISOMAP在基因表达数据的可视化与分类任务中优于ISOMAP。 2. 核函数是一种非常重要的非线性映射方法，也是支持向量机算法能广泛应用的重要条件。核函数的研究可以极大的提升传统支持向量机算法性能。本文基于核函数的研究提出了一种多核核函数来提升支持向量机的泛华和学习能力，通过在生物医学数据集中的仿真实验表明提出的混合核函数有着比传统核函数更好的性能。
索取号	TP18/Y27/2014
英文摘要	The development of computer science and information technology has greatly infacilitated the development of biomedical science. As the information science are developing so fast, recently researchers are focusing on considering how to apply information technology and mathematical science to the biomedial researches. Nowdays machine learning techniques have become the hot point of the information technology and mathematical science , which also have been successfully used to other related research areas. This dissertation is focusing on developing suitable machine learning algorithms and apply them into biomedical analysis. The thesis Mainly discuss two parts : research on developing Manifold learning algorithms in dimensionality reduction and Support Vector Machine in pattern classification and their application in medical data analysis. Dimensionality reduction is one important aspact of machine learning . Also, Manifold learning has been one of the hot point in recent years. Based on the property of local linearity and global non-linearity, manifold learning has been applied to many research fields such like , face recognition and bioinformation. Machine fold learning is a non-linear dimensionality reduction algorithm, which can explore and preserve the inherit structure of non-linear distributed data. However, when encountering the classification task, the original manifold learning methods generally show many shortcomings, such as ,unsupervised learning, sample size, out-of-sample and sensitivity to noise. Since Vapnik proposed the Support Vector Machine (SVM) based on Statistical Learning Theory and kernel trick in 1995, kernel methods based on machine learning algorithm has been developed rapidly. It becomes one of the hot points in academic research now and has been widely used in image processing, biomedical information analysis, and text classification. This thesis mainly focuses on designing suitable kernel functions for SVM and solving unsupervised learning problem for manifold learning in medical data analysis. 1. In order to improve the classification accuracy of gene expression data and solve the high-dimensional problem. This paper proposed an improved ISOMAP for gene expression data visualization and classification, which Neighborhood Component Analysis (NCA) is used to replace the multidimensional scaling analysis (MDS) in traditional ISOMAP algorithm. In the process of dimensionality reduction , NC-ISOMAP can obtain an ideal low dimensional project matrix, which lower dimensional dataset become more compact within class and more separate between class. The experiment results of several biomedical datasets demonstrate that the proposed algorithm has better performance in dimensionality reduction and higher classification accuracy than traditional ISOMAP . So the proposed method was proved adequately effective. 2. Kernel function as one of the most important ways of non-linear mapping, is the essential part of Support Vector Machines (SVM) with such wide application. An independent discipline called Kernel Methods has been formed especially for kernel functions. Research of kernel functions doesn’t only improve the usage of Support Vector Machines , but also gives support to Artificial Intelligence and Machine Learning themselves. Based on the research of kernel functions , multiple kernel function is proposed due to learning problems involving multiple and heterogeneous data sources. Choosing different parameters of different kernel functions or kernel functions according to different properties to improve learning ability and generalization of kernels, and prove the legitimacy of the new kernel.
语种	中文
产权排序	1
页码	61页
分类号	TP18
源URL	[http://ir.sia.ac.cn/handle/173321/14787]
专题	沈阳自动化研究所_数字工厂研究室
推荐引用方式 GB/T 7714	杨秀锋. 基于机器学习的生物医学数据处理方法研究[D]. 中国科学院沈阳自动化研究所. 2014.

入库方式： OAI收割

来源：沈阳自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。