中国科学院机构知识库网格系统: 基于词袋模型的物体分类关键技术研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于词袋模型的物体分类关键技术研究

文献类型：学位论文


作者	张淳杰
学位类别	工学博士
答辩日期	2011-05-21
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	马颂德 ; 卢汉清
关键词	物体分类词袋模型局部特征稀疏约束空间信息 Object Categorization Bag of Visual Words Local feature Sparse Constraints Spatial information
其他题名	Research on Key Technologies of Object Categorization Based on The Bag-of-Visual words (BoW) Model
学位专业	模式识别与智能系统
中文摘要	作为计算机视觉领域的一个基本问题，物体分类吸引了越来越多的研究人员的兴趣。对物体分类技术的研究可以有效地推进图像理解的发展。同时，物体分类技术还可以广泛应用于其它领域，如智能监控、图像检索等。近年来，研究人员对于物体分类技术进行了广泛的研究，并提出了很多行之有效的模型。其中，以基于局部特征的词袋模型最为流行。但是，这种模型忽略了空间信息、语义信息以及视觉词之间的相互联系，从而限制了其分类性能的提高。当前的许多研究工作就着手于如何有效利用这些信息，从而达到更好的物体分类效果。本论文首先对各种物体分类的方法进行综述，分析基于局部特征的词袋模型的优缺点，并在词袋模型的基础上进行了深入的研究和改进。本文的主要成果和贡献包含以下几个方面：首先，提出了一种基于部件的图像特征表示，来克服传统词袋模型中的空间信息缺失和视觉词相互联系缺失的问题。在此基础上，我们提出了两种方法进行物体分类。一，先用SVM分类器预测部件类别，再通过线性组合部件的预测结果来进行物体分类；二，提出了一种提升稀疏约束的双线性模型来进行物体分类。该方法通过双线性模型进行物体类别的预测，通过施加稀疏约束来选择有判别力的视觉词和部件，同时通过提升的方式组合多个稀疏约束的双线性模型，从而达到增强模型的鲁棒性，提高物体分类性能的目的。其次，提出了一种基于非负稀疏编码、低秩稀疏矩阵分解的方法进行物体分类。该方法将非负稀疏编码和最大化抽取有机地结合，可以减少局部特征编码时的信息损失；另外，通过低秩稀疏矩阵分解得到更有判别力的基向量来对图像进行稀疏重构，从而可以达到较好的物体分类效果。第三，提出了一种基于空间金字塔编码和视觉词重加权的方法来进行物体分类。该方法在生成视觉词典和编码阶段对局部特征进行空间金字塔划分，分别聚类、编码，从而在一定程度上克服了传统词袋模型在生成视觉词典时的空间信息缺失问题；同时，通过视觉词重加权来给予有判别力的视觉词更大的权重，从而有利于提高物体分类的性能。第四，提出了两种概念敏感的物体分类方法。一，针对传统词袋模型在生成视觉词典时的语义缺失问题，提出了一种概念敏感的视觉词典生成方法。在生成视觉词典时，不仅考虑局部特征视觉上的相似性，同时还考虑局部特征语义上的一致性，生成语义明确的视觉词典；二，提出了一种基于概念敏感马尔科夫稳态特征的网络图像挖掘方法，将视觉词的空间关系和图像的直方图表示通过求解马尔科夫稳态特征有机地结合起来，从而可以得到更有判别力的特征表示，取得更有效的图像挖掘效果。
英文摘要	In recent years, object categorization has become a hot topic in the field of computer vision, since it can boost the performance of image understanding and can also be applied in many areas, such as intelligent surveillance, image retrieval, etc. Recently, researchers have done a lot of work on object categorization. Many models have been proposed of which the most popular is the bag-of-visual words model (BoW). However, the basic formulation of the BoW model ignores the spatial information, the semantic information and the correlations among visual words, which limits its performance. How to leverage such useful and inneglectable information to boost the performance of the BoW model, has become the main task of most recent work, and simultaneously is also the main focus of this dissertation. We will first overview the existing BoW-based methods and summarize their advantages and disadvantages. Then we propose our improvements based on the BoW model. The main contributions of this dissertation consist of: First, in order to leverage both the spatial information of visual words and their correlations, we propose a component-based image representation, in which each component corresponds to an image region. Based on the improved image representation, we propose two methods for image classification, i.e., the SVM-based solution and the Bi-linear based one respectively. The former uses the SVM classifier to predict the categories of each component, and applies a linear model to predict the categories of each image. The later formulates object recognition into a bi-linear model along with sparsity constraints to indicate two progressive linear relationships among a given concept and the two-level visual elements of images (i.e., visual words and components), yielding a sparsity constrained bi-linear model (SBLM). In the SBLM, Sparsity is used to choose the most discriminative visual words and components, and the bi-linear model for the category prediction. Besides, the boosting scheme is employed to combine different bi-linear models for robust and efficient object categorization. Second, a non-negaitve sparse coding, low-rank and sparse decomposition method is proposed for object categorization. In order to reduce information loss, we propose a new non-negative sparse coding along with max pooling and spatial pyramid matching method to extract local features’ information in order to represent images. Besides, we propose to leverage the low-rank and spar...
语种	中文
其他标识符	200818014628072
源URL	[http://ir.ia.ac.cn/handle/173211/6327]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张淳杰. 基于词袋模型的物体分类关键技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2011.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。