中国科学院机构知识库网格系统: 压缩域中的视频特征提取与应用的研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

压缩域中的视频特征提取与应用的研究

文献类型：学位论文


作者	唐志峰
学位类别	博士
答辩日期	2007-06-08
授予单位	中国科学院声学研究所
授予地点	声学研究所
关键词	压缩域特征提取镜头边界检测视频目标分割
其他题名	Compressed domain video feature extraction and applications
学位专业	信号与信息处理
中文摘要	随着视频采集、存储和压缩编码技术的发展，数字视频数据的数量迅猛增长。目前，关于视频的应用已经从最初单纯的播放操作，发展到要求对视频内容进行访问和操作的更高层次，如视频索引与检索，视频理解等。这些应用中的核心问题是如何有效地对视频内容进行表示以及有效地对视频内容进行访问。由于许多关于视频内容分析的算法是基于像素域的，需要在进行视频分析前对码流进行解码得到视频特征。而压缩的视频码流中存在反映视频内容的特征，通过直接利用从压缩的码流中提取的视频特征，可以避免解码运算，实现实时的视频分析算法。本文的研究主要集中在三个方面：压缩视频码流中的特征提取，利用从视频码流中提取的特征进行镜头边界检测，和利用从视频码流中提取的特征进行视频目标分割。主要的贡献为：（1）建立了压缩域视频特征提取和应用的研究平台；（2）提出了一种新的基于局部特征的实时镜头突变检测算法。该算法利用从压缩域中提取的边缘特征，通过考察相邻帧边缘分布的相似性定义了一种反映局部信息的帧间相似性度量。结合反映全局特征的基于彩色直方图的相似性的度量和改进的滑动窗算法，实现了高性能的镜头边缘检测。相对于现有的基于局部特征的算法，该算法具有更低的运算复杂度，适合于实时的应用。（3）提出了一种改进的基于模型的溶解镜头检测算法：算法在预选阶段采用亮度图像和梯度图像的统计特征互检验的方法显著地提高了查全率；在验证阶段，通过施加多个平行的限制条件，在保证仍有较高查全率的前提下，有效地去除了由摄像机或目标运动引入的误检。实验结果表明，该算法有效地提高了检测的性能。（4）提出了一种高精度的压缩域视频目标分割算法：该算法以压缩域中提取的特征为输入，提取P帧中的运动目标。算法首先采用I帧和P帧中每个块的直流DCT系数和3个交流DCT系数，以及运动补偿信息，重建出P帧的原图像1/16大小的子图像；然后采用快速平均移聚类得到具有较高边界精度的亮度一致的区域；接着利用全局运动估计和目标掩模反向映射得到潜在运动块的分布；最后结合聚类分析结果和潜在运动块的分布，采用基于马尔可夫随机场的统计标号方法对目标和背景区域进行分类。该算法可以得到4×4子块的边界精度，对于CIF格式的码流，在Pentium IV 2GHz平台上可以达到每秒40帧的处理速度。
英文摘要	Along with the technology advances in video capturing, storage, and compression, the amount of digital video has been increasing dramatically. The applications of digital video have evolved from video playback to more advanced areas, such as video indexing and retrieval, video summarization, video understanding, and so on. In these applications, efficient way of representing and manipulating video content is required. Many researches on video content analysis have been conducted in the pixel domain, but as most video data is compressed for efficiency of storage and transmission, decoding process is needed to obtain video features before any pixel domain algorithm. To save the decoding computational cost, this study tries to use video features extracted from compressed video stream by partial decoding process. Research work in this thesis has been focused on three topics: compressed domain video feature extraction, shot boundary detection using compressed domain features, and video object segmentation using compressed domain features. Main contributions include: (1) a research platform on compressed domain feature extraction and applications is implemented. (2) A real-time local feature based shot cut detection algorithm is presented. In this algorithm, by edge features extracted in compressed domain, a new frame similarity measure is defined by evaluating the similarity of edge distributions between consecutive frames. The edge-based measure is combined with color histogram based similarity measure to improve the robustness. Experimental results show that the algorithm is of high performance and applicable to real-time applications. (3) An improved dissolve detection algorithm is proposed. In the candidate selection stage, to improve the recall rate, dissolve candidates are selected by a cross check of two clues: intensity variance curve and edge energy curve. In the verification stage, false positives are eliminated by checking several exclusive constraints. Experimental results show that this algorithm can enhance the detection of dissolves effectively. (4) A high precision compressed domain approach for video object segmentation is presented. Moving object masks in P frames are extracted by exploiting features obtained by partial decoding. First， a 1/16 sub image is constructed using DC and three AC coefficients; Then a fast mean shift clustering algorithm is used to divide the image into regions with coherence luminance and obtain high precision region boundaries; Next, potential motion blocks are marked by global motion estimation and object mask backward projection; after spatial segmentation and potential motion block marking, a MRF-based statistical labeling method is exploited to classify regions into two classes: moving object and background. The proposed algorithm can get a boundary precision of 4×4 sub-block with a high processing speed. For CIF video streams, the algorithm can run at a speed of 40 frames per second in a Pentium IV 2GHz platform.
语种	中文
公开日期	2011-05-07
页码	129
源URL	[http://159.226.59.140/handle/311008/206]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	唐志峰. 压缩域中的视频特征提取与应用的研究[D]. 声学研究所. 中国科学院声学研究所. 2007.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。