中国科学院机构知识库网格系统: 基于视觉机理的光照及形状不变性研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于视觉机理的光照及形状不变性研究

文献类型：学位论文


作者	谷鹄翔
学位类别	工学博士
答辩日期	2016-06
授予单位	中国科学院研究生院
授予地点	北京
导师	潘春洪
关键词	特征表达，视觉系统，认知机理，不变性，非几何变换，高动态范围
中文摘要	在计算机视觉和机器学习等领域，特征表达是最核心的研究内容之一。特征学习尤其是深度学习模型的出现，打破了计算机解决视觉问题的一般流程（特征+ 分类器），将特征表达和最后的推理、预测或者识别能够有机得结合在一起，极大促进了特征表达技术的发展。尽管特征学习（尤其是DBN[48] 和CNN[67] 等深度学习模型）在学术界和工业界都取得了重大的突破，但仍存在诸多问题：（a）无论人工提取特征还是特征学习模型都缺乏记忆建模能力，而记忆对生物认知影响深远；（b）无论人工特征模型还是特征学习模型都仅针对特定变化，而缺乏处理所有变化的统一模型；（c）深度学习模型需要大量样本进行训练，且缺乏数学理论或者生理学实验支持。与之相反，人类视觉系统却能很好地解决上述问题。在已有记忆的帮助下，人类可以通过观察少量样本就快速辨识出目标物体，无论物体以何种姿态（形状不变性等）处于何种背景环境（光照不变性等）中。近年来研究人员通过微电极记录等新技术分析单个细胞的触发频率等特性，大大促进了视网膜、视皮层等区域功能特性的研究。而这些研究成果为解决计算机视觉、机器学习等领域存在的问题提供一种新的解决思路。亮度感知和形状认知是人类视觉系统最基本的任务，而光照变化和形状变化是计算机视觉领域仍亟待解决的难题。本文借鉴人类视觉系统处理亮度和形状信息的机理，展开对光照不变性和形状不变性这两个目标的研究，取得了一系列的研究成果，主要有：一、提出了一种局部非线性的高动态图像范围压缩算法。受人类视觉系统处理辐射光强的原理（人类视觉对暗区域变化更敏感）启发，提出一种局部非线性的色调映射模型，将高动态范围图像压缩到普通图像，从而解决了普通相机由于相机响应函数对辐射照度执行的非线性转换以及模拟信号转换成数字信号的量化过程而导致的光照变化问题。论文提出的局部非线性模型建立在生理学实验基础（Wiber-Fenchner 理论[64]：人类感受辐射强度跟外界输入辐射强度成对数关系）之上，符合人类视觉系统处理光强的方式。在模型求解过程中，根据模型参数的物理意义从图像中估计出引导图像，再利用引导图像对模型求解进行约束，最后取得闭合解。本文中算法解决了原有线性模型中亮度失真问题，同时也克服了主流算法存在的整体对比度不高、亮暗区域细节丢失、光晕等缺点。算法只有线性复杂度，压缩结果在主客观评测指标中均优于主流模型。二、提出一种基于视觉模型的快速高动态范围图像和视频的压缩框架。本文模拟视觉系统感光细胞和水平细胞的感受野处理特性，将高动态范围压缩逐步简化为映射矩阵估计问题，并根据映射矩阵的物理意义提出多种映射矩阵估计方法。本文算法在图像客观评价指标中与现有主流算法相当，但速度提升两到三个数量级。进一步，我们利用视网膜细胞在不同背景下对单位亮度辐射的激发反应曲线，提出一种快速处理高动态范围图像视频的压缩框架。相比于目前主流局部算法，该框架能够达到实时处理速度，且具有对比度清晰、无鬼影现象、对视频亮度突变鲁棒等优点。三、提出一种面向仿射变化、平面外变化、背景变化等广义变化的深度网络结构。Poggio 等人[103] 探索记忆模块对视觉系统腹侧通路信息处理的影响，提出了处理仿射变化、平面外变化、背景变化等广义变化的统一模型M-theory理论，对解决已有学习模型存在的上述问题提供了一个新的方向。论文将该理论目前的浅层网络深度化，解决了当变化种类增加时，记忆库样本规模乘性增长的缺点。实验结果表明，M-theory 理论可以搭建在已有的手工设计特征和学习特征的最上层，进一步拓宽已有特征处理不变性种类的范围和性能。四、将M-theory 理论的应用拓展到非几何变化中。M-theory 理论目前仅应用于仿射变化、平面外变化等几何变换，论文将其拓展到非几何变换中。论文首先从理论上证明了当某种非几何变化可以表示为线性对称卷积（例如高斯模糊）的形式时，该变化可以利用M-theory 理论获得很好的不变性。接着，论文在模糊和降质人脸图像识别应用中验证了上述结论的正确性。在非监督的情况下，算法可以利用随机图像（例如噪声点阵图像）作为记忆库样本进行训练并取得很好的识别效果。尤其在严重模糊等降质条件下，识别准确率大大高于目前主流算法。
英文摘要	Data Representation has become the fundamental task in computer vision and machine learning community. Feature learning, especially via deep leaning, has abandoned the traditional pipeline (feature extractor and classifier) of solving computer vison problems by building a hierarchical architecture in which representation and following induction, prediction and recognition could be trained together. Although Feature learning, especially under the structures of DBN and CNN, has made a huge breakthrough in both academic and industrial community, there are still some problems need to be solved. First, few researches have taken advantage of memory which are very important for human vision system. Second, there does not exsit united framework to deal with all transformations. Last, huge amount of data are needed for training and the reason why deep models work remains unsolved. On the contrary, human vision system could easily handle these problems with a high speed. With the help of memory, we could recognize the target which may contain all kinds of transformations withonly a few samples. Recently, researchers on human vision system ultilize new technology such as Microelectrode recording technique to analyze the firing rate of one specific cell, which tremendously improved the research on the functional properties of retina and visual cortex area. These achievements provided reseachers in computer vison community with new direction to solve the above problems. Illumination adaptation and shape recognition are two essential tasks of human vision system, and illumination change as well as shape variation are two main obstacles in computer vision society. Hence, we took the advantage of some achievements on retina and cortex and made several improvements on illumination and shape invariance. Our main contributions are illustrated as follows: We proposed a locally nonlinear tone mapping algorithm. Inspired by the principle of our vision system (more sensitive to changes in dark region), we proposed a new locally nonlinear tone mapping algorithm to compress high dynamic range image into low dynamic range image so that the illumination problem caused by the nonlinear mapping of digital response function and quantization loss from analogue signals to digital codes could be solved. Our model is based on physiological experiments ( Weber-Fechner states that subjective sensation is proportional to the logarithm of the stimulus intensity) and coincides with the mechanism of retina. When solving our model, we first estimate two guided images according to the physical explaination of the parameters in our model. Then we adopt these two images as two constraints for the final close-from solutions. Our model has solved the distortion problem of the local linear model and achieved halo-free results which are common draws in filterbased approaches. In addition, our model has linear complexity. Both subjective and objective evaluations have demonstrated the effectiveness of our model. We proposed a retina based real-time tone mapping framework for HDR image and video. Inspired by the work mechanism of photoreceptor and horizontal cell，we gradually simplify tone mapping as an estimation of a mapping matrix and then provide some rules to estimate this mapping matrix according to its physical interpretation. Our framework achieves similiar scores in the subjective and objective evaluations with the state-of-the-art methods while speeds up about one hundred to one thousand times. Meanwhile, with the help of the adaptation mechanism of our vision system，we proposed a real-time tone mapping framework for HDR video. Compared with the mainstream approaches, our framework could achieve high contrast and ghost/flickering-free results with real-time speed. We built a hierarchical network for general transformation such as affine transformation, out-plain rotation and background variation Prof. Tomaso Poggio and his team explored how our memeory affect on information processing in human Ventral Stream and then proposed a united framework, M-theory, to get invariance of general transformation, which provide a new direction for solving the above problems. However, the shallow network the M-theory will result in a problem that the memory dataset will increase exponentially when the categories of invariance grow up. Accordingly, we build a hierarchical network to solve this drawback. Experiments indicated that M-theory could be built on the top of other handcrafted descriptors or learning features to get better performance. We expanded the application of M-theory to non-geometric transformations. The current applications of M-theory are limited to geometry transformations such as affine transformation and out-plain rotation, we expand it to non-geometry transformations. We theoretically proved that we could get robust invariant features via M-theory when the transformation can be expressed as a convolution with a linear symmetric1 filter, such as Gaussian-blurring transformation. Then we demonstrated the correctness of our theory in face identification under blur and degraded circumstances. High performance could be achieved even if we adopt only random images, such as noise dot image, as human memory. Compared with state-of-the-art methods, we could achieve much higher accuracy, especially under severe circumstances.
学科主题	第一研究方向
源URL	[http://ir.ia.ac.cn/handle/173211/12017]
专题	毕业生_博士学位论文
作者单位	中科院自动化所
推荐引用方式 GB/T 7714	谷鹄翔. 基于视觉机理的光照及形状不变性研究[D]. 北京. 中国科学院研究生院. 2016.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。