中国科学院机构知识库网格系统: 图像内容表示与分类方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

图像内容表示与分类方法研究

文献类型：学位论文


作者	张琳波
学位类别	工学博士
答辩日期	2011-05-25
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	王春恒 ; 肖柏华
关键词	图像内容表示图像分类 bag-of-words bag-of-phrases 码本构建图像内容理解 image content representation image classification bag-of-words bag-of-phrases codebook construction image content understanding
其他题名	Content-based Image Representation and Classification
学位专业	模式识别与智能系统
中文摘要	随着数码相机、摄像头以及高速扫描仪等各种图像采集设备的普及，数字图像和视频成为人们生活中的重要信息记录载体。同时，互联网的迅猛发展，也将不计其数的数字图像和视频资源摆在了人们面前。如何高效的组织和管理这些庞大的图像和视频资源以方便人们的各种应用，成为摆在研究者面前的重要问题，基于内容的图像、视频分类技术应运而生。由于视频由大量的图像帧序列组成，因此图像内容分类是视频内容分类的基础，图像内容分类的技术可以直接应用到视频中各帧图像的分类中。本文主要针对图像内容分类这一课题，基于bag-of-words图像内容表示方法，从码本构建、分类系统设计以及多特征融合等方面展开研究，主要内容包括：第一、针对各种带有互补性的图像特征，提出并设计实现了一种基于多特征融合的图像内容分类系统。该系统使用两种检测子和五种描述子组成十种局部特征；然后，将这十种局部特征利用bag-of-words模型和空间金字塔划分得到多通道的bag-of-words直方图向量；最后，将多通道的直方图向量通过核函数加以融合来提高分类系统的分类准确率。本文将此系统应用于国际视觉对象分类竞赛The PASCAL Visual Object Classes Challenge (VOC)2009，取得了较好的结果。第二、针对多类别分类中码本大小、向量维数以及训练图像数目之间的关系问题，本文提出了将类别信息融入码本构建和分类系统设计过程中，并利用分类器投票策略给出图像类别的最终判定结果的方法。码本多样性与码本过大导致向量维数偏高的矛盾在本文方法中得到了很好的解决。此外，在训练每个分类器的时候，负样本数量被控制在正样本数量的三倍以内，有效地避免了正样本淹没在负样本中的情形。最后的实验结果表明，本文的方法可以得到比使用单个全局码本的方法更好的分类性能。第三、针对正负样本数量差距悬殊的不对称分类问题，本文提出了利用boosting的方式训练级联的码本和分类器的方法。每个节点上使用不同的码本，既保证了每个码本中正样本产生的码字占据一定的比例，又可以捕获负样本千变万化的局部特征。同时，本文方法可以通过调节系统中的两个参数来调节节点的数目以及每个节点分类器的输出结果，以满足不同的分类需求。第四、针对基本bag-of-words图像表示方法中局部特征之间空间排列信息的丢失问题，本文从分析计算机视觉领域bag-of-words图像内容表示与文本分类领域bag-of-words文档表示的关系出发，提出了一种新的加入位置信息的方法。
英文摘要	With the development of image acquisition devices, such as digital cameras, video recorders and scanners, digital images/videos have become important information carriers in our daily life. Besides, the rapid development of internet makes it easy to obtain thousands of web images/vedios. As a result, developing a system to organize this huge digital data source for use becomes more and more important. A key component of this kind of system is content-based image/video classi¯cation. As each video is composed by a large number of image frames, the techniques developed for image content classification can be applied to video classification directly. In this thesis, most research are focused on content-based image classification. The contributions range from codebook construction, classification system design to multiple features combination, all of which are based on bag-of-words representation models. Details of these contributions are listed as follows: Firstly, as none of the existing features has the power to achieve the best performance any time, a multiple features combination system is designed and implemented in this thesis. First, 2 local feature detectors and 5 local feature descriptors are applied on images to obtain 10 kinds of local features; Then, multiple channels bag-of-words histograms are computed using spatial pyramid bag-of-words image representation models. Finally,these histograms are fused together through a multi-kernel combination strategy to achieve better classification performance. The results from PASCAL Visual Object Challenge (VOC)2009 show that, our system is efficient. Second, based on the relationship among the codebook size, vector dimension and the number of training images per class, class-specific codebooks and classifiers are trained based on the class labels of training images. The predict result of each test image is produced using a voting strategy. The conflict between the diversity of codebook and over¯tting in training process is solved in this article. In addition to this, the negative samples are limited to less than three times the positive samples, which avoids the situation where positive samples were submerged in the negative samples. The final results shows that our proposed strategy can get better result than approach using one universal codebook. Third, for the asymmetry learning goal, a strategy which trains a cascade of codebooks and classifiers in boosting style is proposed...
语种	中文
其他标识符	200818014628074
源URL	[http://ir.ia.ac.cn/handle/173211/6334]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张琳波. 图像内容表示与分类方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2011.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。