中国科学院机构知识库网格系统: 序列图像稀疏表示与目标跟踪研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

序列图像稀疏表示与目标跟踪研究

文献类型：学位论文


作者	杨叶辉
学位类别	工学博士
答辩日期	2016-05
授予单位	中国科学院大学
授予地点	北京
导师	张文生
关键词	序列图像稀疏表示目标跟踪多任务学习低秩性
中文摘要	序列图像为在不同时间、不同方位对目标连续获取的系列图像，广泛存在于视频监控、辅助驾驶、人机交互、军事导航、导弹打击等社会和军事层面。对于图像的有效表示是众多计算机视觉应用(如图像识别、目标跟踪、行为识别等)的基础问题。由于在序列图像中可能出现遮挡、光照变化、形变、背景杂乱等一系列不可预测的变化，对于序列图像中目标的准确表示是一个极富挑战性的研究课题。稀疏表示理论是最有效的图像表示方法之一。该理论起源于哺乳动物视觉皮层对自然图像刺激的稀疏响应，在人脸识别、目标跟踪、图像去噪、畸变校正等众多领域获得了广泛的应用。本论文针对单个图像样本的稀疏特性，构建了双层次稀疏表示模型，从全局轮廓和局部细节两个层次挖掘稀疏特征以获得更加完备的图像表示；针对多个图像样本中目标物体的组稀疏特性，构建了加权多任务稀疏表示模型，引入的自适应加权机制使得表示模型更具判别性；针对序列图像数据中物体外观的变化特点，构建了序列图像时域与低秩性约束表示模型，有效抵抗序列图像中目标外观的突变干扰。以上三种稀疏表示模型应用于目标跟踪任务，在标准数据集上与当前一些流行的跟踪算法对比，跟踪准确率和鲁棒性均获得了显著提升。本文的主要工作与创新点如下：提出了图像双层次稀疏跟踪算法。针对单幅图像表示问题，通过双层次表示模型挖掘其更加完备的稀疏特性，同时结合判别式模型和生成式模型的优点，既能够充分利用背景信息，又能在训练样本较少时保持稳定性能。此外，全局表示字典与分类器进行耦合学习以自适应于跟踪过程中场景的变化。通过在 15 个标准数据集上与 10 个流行算法进行对比，结果表明本算法在跟踪的平均中心误差和成功率两个指标上均获得了最优的结果，而且能够有效地克服漂移问题。提出了加权多任务逆向稀疏跟踪算法。针对多个正样本之间存在的组稀疏特性和负样本图像的稀疏特性，将正负样本表示为候选样本的线性组合，构建了统一的多任务稀疏表示模型，并且引入加权机制对正负样本和候选样本之间的关系进行差异化惩罚以提高表示模型的判别性能。相较于传统的多任务稀疏跟踪算法 MTT (multi-task tracker)，本算法对单帧图像的平均执行时间缩短了42%，平均成功跟踪率提高了46%。与 12 个当前流行跟踪算法进行对比，本算法在平均性能指标上取得了最优效果。提出序列图像时域与低秩性约束跟踪算法。针对序列图像中目标外观的变化特点构造表示模型，通过核范数正则化序列目标样本中的低秩结构，同时利用L_{1,2} 混合范数合理地对相邻帧中目标样本的差异进行约束。在获得表示模型的编码矩阵后，本算法构建加权编码图以实现更加鲁棒的目标跟踪。实验表明本算法的表示模型能够有效地对序列目标外观中不可预测的突变进行建模。与 12 个当前流行算法的对比中，本算法在 26 个标准数据集上取得了最优跟踪结果，在平均中心位置误差和平均跟踪成功率指标上比次优算法分别提高了69% 和 24%。
英文摘要	Sequential image data, images of target objects acquired consecutively in space and time, widely exists in both civil and military applications, such as video surveillance, driving assistance, human-machine interaction, military navigation, missile precision strike. An effective representation model for images is a basic topic in computer vision tasks (e.g., image recognition, object tracking, behavior identification). Since unpredictable changes may occur in consecutive images, accurate representation of sequential images is a challenging research subject in compute vision. Sparse representation, which stems from the sparse response of visual signal in mammalian visual cortex, is one of the most effective image representation models. It is widely utilized in numerous areas, such as face recognition, object tracking, image denoising, distortion correction and so on. In this dissertation, firstly, we construct a two-level representation model to exploit the sparse features of a single image in both global and local view. Secondly, to mine the group sparsity in multiple images, this dissertation proposes a weighted multi-task sparse representation model, wherein the introduction of weighting scheme increases the discriminative of the representation model. Thirdly, to accommodate the variations of the target appearances in sequential images, a representation model is presented based on the temporal and low-rank constraints of sequential images, and this model is robust to the disturbance of abrupt change in sequential images. All of the three aforementioned sparse-based representation model are applied to object tracking tasks. Comparing with some state-of-the-art tracking algorithms, the proposed algorithms result in superior performance on numerous public available video sequences. The main contributions of this dissertation can be summarized as follows: We propose a two-level image representation model for sparse-based tracking, which achieves more complete sparse features in a single image. The proposed algorithm takes the merits of both discriminative and generative model, i.e., it can not only make a better use of background information, but also give robust performance when the number of training samples is small. Additionally, global dictionary and classification parameters are coupled learning to make the model adaptive to the appearance changes in tracking process. Comparing with 10 state-of-the-art trackers on 15 common used video sequences, the proposed algorithm achieves the best performance in both average central-pixel-error (CPE) and success tracking rate (STR) evaluation metrics. Moreover, it can conquer drifting problem effectively. We construct a weighted multi-task learning model for reverse sparse tracker. To mine both the group sparsity in the target appearances and the sparse feature among the background patches, both positive and negative samples are represented as a linear combination of tracking candidates to construct a multi-task sparse model. Moreover, a novel weighted scheme is introduced to make the representation model more discriminative. Compared to the traditional multitask tracker (MTT), the execute time for one frame of the proposed algorithm is reduced by 42%. On the other hand, our average STR is increased by 46%. The experimental results show a favorable performance of the proposed algorithm against 12 state-of-the-art trackers. We put forward a temporal restricted and low-rank sparse tracker based on the characteristic of target appearance change in sequential images. The underlying low-rank structure among the consecutive target observations are constrained by nuclear norm. Meanwhile, the proposed algorithm also reasonably models the discrepancies between two adjacent frames via L_{1,2} mixed norm regularization. After obtaining the representation matrix between the training samples and tracking candidates, we construct weighted representation maps to give robust tracking results. Experimental evaluations prove that the representation model in the algorithm can effectively model the sudden changes in consecutive target appearances. Comparing with 12 alternative trackers, the proposed algorithm achieves the best performance on 26 standard video datasets, and the average CPE and STR is increased by 69% and 24% compared to the second best tracker.
语种	中文
源URL	[http://ir.ia.ac.cn/handle/173211/11596]
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	杨叶辉. 序列图像稀疏表示与目标跟踪研究[D]. 北京. 中国科学院大学. 2016.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。