中国科学院机构知识库网格系统: 基于时空结构表达的视觉行为识别方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于时空结构表达的视觉行为识别方法研究

文献类型：学位论文


作者	单言虎
学位类别	工程博士
答辩日期	2014-11-27
授予单位	中国科学院大学
授予地点	中国科学院自动化研究所
导师	黄凯奇
关键词	行为识别时空切片结构模式慢特征骨架流 Action recognition spatial-temporal slice structural pattern slow feature skeleton stream
其他题名	Spatial-Temporal Structure Representation for Human Action Recognition
学位专业	模式识别与智能系统
中文摘要	人的行为识别是计算机视觉领域中要解决的终极问题之一。相对于物体检测和分类来说，人的行为识别是在其基础上要实现的更高层的目标，涉及到对人类视觉系统的更深层的理解。除了理论研究价值之外，行为识别也具有非常广泛的应用前景，如人机交互、智能视频监控、智能家居以及视频检索等。本文从行为序列中时空结构表达的有效性、鲁棒性以及时空结构内在联系方面展开工作，提出了基于时空结构表达的视觉行为识别方法。在具体而言，本文的主要工作及贡献如下： 1. 一个有效的行为表达是决定行为识别性能的关键。通过观察视频序列中的时间和空间信息，提出了一种新的基于自适应时空切片的行为特征表达方法。首先，通过提出的最小平均熵准则自适应的选择出最佳的切片方向，使得运动前景分布在少数切片中，从而解决由于信息分散带来的不确定性；接着，这些切片被连接在一起并转化为两个一维信号；最后，提取一维信号Mel倒谱频率系数作为行为的特征。在多个数据库上的实验结果表明：基于自适应时空切片的行为特征表达方法对于识别不同类型的行为是非常有效的，而且该方法的高效性使得其具有很大的应用潜力。 2. 在解决真实场景行为识别问题时，受复杂环境的影响，无法直接对视频序列进行建模。而局部特征虽然能够从一定程度上克服噪声的影响，但却缺乏对行为高层信息的表达能力。为了进一步提高行为识别系统的鲁棒性，提出了一种基于概率结构模式推理的行为识别方法。首先，利用层级随机图的方法来从局部特征点中自动学习层级结构信息，并且估计不同特征点之间的连接概率；然后，提出了一种基于AND/OR推理的方法来从层级结构中推理出具有概率的潜在高阶模式，概率信息可以有效的描述高阶模式的不确定性。基于学习到的高阶模式，利用马尔科夫链蒙泰卡罗（MCMC）在行为序列中查找与该模式最符合的实例来对行为进行表达。在当今最具挑战的两个真实场景行为数据库上的实验表明，学习到的高阶模式可以有效提升局部特征的行为表达能力，对于识别真实场景的行为具有很强的鲁棒性。 3. 为了更好的利用时间和空间信息来解决行为序列中存在的不稳定性和较大类内差等问题，通过分析时空的内在联系，提出了一种基于慢特征骨架流学习的行为识别方法。对于深度传感器估计的人体骨架序列（骨架流），首先，通过利用节点之间的空间结构信息将由节点坐标组成的坐标流转化为多阶节点流，这样可以有效提高节点流的稳定性；接着，使用慢特征分析方法学习每个节点的视觉模式，学习到的高层视觉模式被编码到每帧骨架的空间表达中。通过时序信息对空间结构进行约束，可以有效降低骨架特征之间的类内差。实验表明，合理利用行为序列中时空结构的内在联系能够很好的提高行为表达的稳定性和显著性。
英文摘要	Human action recognition is one of the most important issues in the field of computer vision. Compared with object recognition in still images, human action recognition corresponds to a more high-level goal which concerns a deeper understanding of human visual system. Besides, human action recognition attracts great attention in recent years, due to its wide application prospects, e.g., intelligent video surveillance, human-computer interaction, smart home, video retrieval, etc. We propose a Spatial-Temporal Structure Representation method for human action recognition to address the problems of effectiveness, robustness and the spatial-temporal relationship in spatial-temporal structure representation. The main contributions of this thesis include: 1. We investigate the spatial and temporal information in video sequences and propose a novel efficient adaptive spatial-temporal slice representation for action recognition. Firstly, a Minimum Average Entropy (MinAE) principle is proposed to select the optimal slicing angle for each action sequence adaptively. This allows the foreground pixels to be distributed in the fewest slices so as to reduce more uncertainty caused by the information dispersed in different slices. Then, the obtained slice sequence is transformed into a pair of 1D signals to describe the distribution of foreground pixels along the time axis. Finally, the Mel Frequency Cepstrum Coefficient (MFCC) features are calculated to describe the spectrum characteristics of the 1D signals over time. Extensive experiments on different types of action datasets demonstrate the effectiveness of the proposed adaptive slice based representation. The high efficiency makes it possible for real-time applications. 2. In order to improve the robustness of realistic human action representation, we propose a Probabilistic Structural Pattern Inference method for human action recognition. Firstly, we apply Hierarchical Random Graph (HRG) to automatically learn hierarchical structures from local features and estimate the relationship of two words without direct connection. Then, an AND/OR inference approach is presented to infer the probabilities of the potential high order patterns in the hierarchical structures, which describe the uncertainty of patterns. Based on the learned high order patterns, a MCMC based method is used to localize the best fitting instance of a high order pattern in an action sequence for action recognition. Extensive experiments on tw...
语种	中文
其他标识符	201118014628032
源URL	[http://ir.ia.ac.cn/handle/173211/6657]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	单言虎. 基于时空结构表达的视觉行为识别方法研究[D]. 中国科学院自动化研究所. 中国科学院大学. 2014.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。