中国科学院机构知识库网格系统: 视频运动分析与事件识别

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

视频运动分析与事件识别

文献类型：学位论文


作者	李莉
学位类别	工学博士
答辩日期	2010-06-03
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	胡卫明
关键词	视频结构分析镜头检测关键帧提取视频事件识别镜头分类特征融合主导集聚类复发注意机制兴趣点时空兴趣点 video structure analysis shot detection key frame extraction video event recognition shot classification feature fusion dominant set clustering algorithm attention machanism keypoints spatio-temporal interest points
其他题名	Video motion analysis and event recognition
学位专业	计算机应用技术
中文摘要	随着多媒体技术和互联网技术的飞速发展，以及第三代移动通信技术（3G）等技术的推广和普及，以视频为代表的多媒体数据正在以惊人的速度增长。如何使人们对视频中包含的有用信息进行快捷定位、方便获取以及有效管理以及是一个亟待解决的问题，该问题的本质就是如何用计算机技术对视频内容进行有效分析和表达，使得其能够建立上下文信息和相关的领域知识，从而融合各种线索进行推理，以此为基础建立特征与语义之间的联系。静态特征和动态特征是视频的两个主要属性，前者反应了视频图像帧的表观属性，主要包括视频中的人、物体、建筑物等；而动态特征是视频区别于静态图像的一个重要属性，主要包括视频中物体的运动以及视频中人与人之间的交互运动等，是视频最重要的信息来源。如何有效地表达、融合视频的这两个属性，并应用到视频内容的分析和理解之中，是本文的主要研究方向。结合这个研究方向，本文的工作主要集中体现在三个方面：（1）基于内容的视频结构分析；（2）体育视频内容分析；（3）事件识别。在这三个方面，本文的工作取得了以下的研究进展： 1. 我们提出了基于光流特征的视频镜头检测方法。主要是利用基于梯度约束法计算出来的光流在镜头的不稳定性来检测镜头。这是一种无监督的方法，不需要任何的训练数据，只需要用少量的参数就能处理很多镜头边界检测的复杂情况如闪光、物体的剧烈运动等，而且能同时处理切变和渐变问题。 2. 提出了基于非参数运动特征和信息熵的镜头关键帧检测方法。由于颜色的HSV空间能很好地符合人的主观感知，因此我们将运动矢量场也转化成锥形的MVS空间，然后用非参数的Mean Shift聚类算法来对MVS空间进行平滑与聚类，得到运动特征的中层描述，并根据这个中层描述计算视频每一帧的信息熵，并选择信息熵最大的帧作为关键帧。 3. 提出了基于光流特征的篮球视频语义事件识别。我们利用光流提出了基于方向活动性的运动累积向量描述子、运动特征直方图及直方图的熵等运动特征描述子，并综合利用纹理特征来作为特征表示，并融入时序的上下文信息提高了识别的准确性。 4. 我们提出了一个基于主导集聚类的主颜色提取的方法，并应用到足球视频镜头分类之中。由于主导集聚类算法得到的主导集直接对应于主颜色特征，因此这种算法不需要任何阈值的设定，而且对于足球场地、光照、天气等复杂因素都非常鲁棒。 5. 提出了基于人的注意机制的视频的静态信息与动态信息的特征融合算法，并应用到视频事件识别之中。我们提出了两种框架：第一种框架是检测视频每一帧的兴趣点，并提取每一个兴趣点的静态（SIFT）与动态（光流）信息，然后借助于人的注意机制，建立关于运动信息的注意模型，并根据这个注意模型来指导事件识别任务，以得到与识别最相关的那些特征，最后用推土机距离来匹配视频帧进行视频识别。第二种框架是将视频表示成时空兴趣点特征的集合，然后再利用人的注意机制，融合时空兴趣点的静态和动态特征；在这个框架中，我们不仅用动态信息来指导识别任务，而且还用静态特征来指导识别任务。进一步，我们还提出了两种注意的投票机制，一种是基于概率的投票机...
英文摘要	With the rapid growth of the technology of multimedia and network, especially, the widespread of the International Mobile Telecommunications-2000 (IMT-2000), better known as 3G or 3rd Generation, the amounts of multimedia data are increasing greatly. How to efficiently locate, exploit and manage the useful information from video is in urgent demand. The essence of this problem is how to efficiently analysis and represent the video event, to construct context and related domain knowledge such that inference various cues and make the relation of features and semantics. Static and dynamic features are the two main attributes of video. The latter can basically be obtained from static images, e.g. person, objects, buildings, etc; while motion features are the important attribute distinguished from static images, e.g. the motion of objects and the interaction among different people. How to efficiently describe the two attributes of video and fusion the two attributes are the study content of this thesis. Based on this direction, the main work including: (1) Content based video structure analysis; (2) Sports video content analysis; (3) Event recognition. The main contributions of this thesis include following issues: 1. We proposed a optical flow based shot detection algorithm. Since the calculation of optical flow field depends on the assumption of brightness constancy, the violation of brightness constraint across a shot change provides a motivation for our method. The motion discontinuities are regarded as the candidate boundaries and the color features are combined to remove false alarms. Experimental results demonstrate that this method is not only robust to camera and object motion, but also can handle complicated situations. 2. Nonparametric motion features and information entropy based key frames extraction method is proposed. We propose a compact representation of the dominant motion information for each frame, based on a mean shift analysis procedure. The criteria of key frames is the maximum of the entropy, and mutual information is used to measure the similarity between frames. Experimental results demonstrate that the key frames we extracted are more concise and informative. 3. We presents a set of novel features for classifying basketball video clips into semantic events and a simple way to use prior temporal context information to improve the accuracy of classification. Specifically, the feature set consists of a motion descriptor, motion hi...
语种	中文
其他标识符	200618014629081
源URL	[http://ir.ia.ac.cn/handle/173211/6291]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李莉. 视频运动分析与事件识别[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2010.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。