中国科学院机构知识库网格系统: 网络图像中合成文本检测及版面分割方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

网络图像中合成文本检测及版面分割方法研究

文献类型：学位论文


作者	陈凯
学位类别	工学博士
答辩日期	2016-06
授予单位	中国科学院研究生院
授予地点	北京
导师	刘成林
关键词	合成文本检测局部对比度分割多方向文本行提取条件随机场版面分割背景矩形分析
中文摘要	随着互联网、智能手机和通信技术的迅速发展，互联网（包括移动互联网）上多媒体数据快速增长。文字作为一种普遍使用的交流工具，经常被人工添加到图像中以传递信息，并在网络（例如：微博、微信和购物网站等）上广泛传播。因此，图像文本内容的识别与理解对于有效地利用网络信息具有重要意义，并受到了学术界和工业界的广泛重视。图像文本识别系统包含文本检测、版面分割和文本识别。由于网络图像背景复杂、颜色多变、多语言混杂、图文混合并且版面复杂，文本检测和版面分割面临一系列技术挑战。本文结合图像处理、模式识别和概率图模型等相关领域的技术，对网络图像中的合成文本检测与版面分割进行了深入的研究。相比现有方法，本文所提出的方法在精度、召回率等方面具有一定的优势。本文主要研究工作和贡献如下： -提出了一种基于局部对比度分割的网络图像合成文本检测方法，充分利用了网络合成文本图像的特点。该方法首先通过分别检测笔划轮廓与笔划内部区域的策略获取候选文本部件，随后利用文本/非文本部件分类器过滤非文本部件，最后基于启发式的规则连接文本部件获得文本行，并利用文本行验证过滤非文本行。在提取候选文本部件时，该方法首先采用局部对比度阈值分割将图像分成光滑和非光滑区域。光滑区域中包含笔划内部区域，而对非光滑区域进行局部二值化可分离笔划轮廓和背景轮廓。将候选笔划轮廓和候选笔划内部区域予以合并，即获得候选文本部件。在公开数据集上的实验结果表明，本文提出的方法与现有最好的方法具有可比性。 -提出了一种基于条件随机场（Conditional Random Field, CRF）的多方向文本行提取方法，采用先将部件聚成行再过滤非文本部件的策略，以避免一开始就误过滤文本部件。在获得候选文本部件后，该方法首先连接部件构建最小生成树（Minimum Spanning Tree, MST），随后采用由粗至精的思路判断MST中每条边连接的部件对属于同一行的权值。在基于CRF分类判断部件的标签后，依据边的权值将部件聚合成行，并采用文本/非文本行分类过滤非文本行。与基于局部对比度分割的文本检测方法相比，在候选部件提取方法不变以及实验数据集相同的情况下，检测结果的提升说明了该方法的有效性。 -提出了一种基于背景矩形分析的版面分割方法。绝大多数现有方法仅利用前景或是背景提供的信息，该方法则综合考察前景和背景提供的信息以分割版面。在获得文本检测结果后，该方法对文本行（文本区域）和非文本部件（非文本区域）分别进行分析，并综合获得最终版面分割结果。针对文本行，本文首先提取同一文本行内相邻部件间的背景矩形，随后基于启发式规则和分类器过滤版块内背景矩形，最后合并版块间背景矩形获得分隔符，并利用它们将文本区域分成不同的版块。针对非文本部件，本文先后过滤噪声部件和与文本块有重叠的部件。在ICDAR2009复杂文档版面分割竞赛数据集、ICDAR2011历史书籍以及历史报纸版面分割竞赛数据集三个不同类型的数据集上取得的领先性能证明了该方法的有效性。
英文摘要	The multimedia data, including texts, image and video, is increasing rapidly on the Internet and mobile network. Images with embedded texts are in considerable proportion in network media data. Therefore, reading the texts will help to better understand the image contents. However, the automatic text reading in born-digital images is still a challenging task and has inspired great interests in both academia and industry. A text information extraction system consists of three parts: text detection, page segmentation and text recognition. Born-digital text detection and page segmentation face a series of challenges due to cluttered background, variations of color, multilingual texts, mixed texts and graphics, and complex layouts. In this thesis, we present an in-depth study on born-digital text detection and page segmentation by combining techniques in image processing, pattern recognition and probabilistic graphical model. Experimental results on several public datasets demonstrate the effectiveness and superiority of the proposed methods. The contributions of this dissertation are summarized as follows: - We propose a born-digital text detection method by local contrast-based segmentation, which takes full advantage of the characteristics of born-digital text in web images. The proposed method first extracts candidate text connected components (CCs), then applies text/non-text CC classification to filter non-text CCs. Subsequently, text CCs are then grouped into text lines based on heuristic rules. At last, non-viable text lines are filtered by text line verification. We detect text contours and stroke interior regions separately and combine them to extract candidate text CCs. First the image is segmented into non-smooth and smooth regions based on local contrast thresholding. Text contour pixels and non-text contour pixels in non-smooth regions are detached using local binarization. Fortunately, stroke interior regions correspond to smooth regions directly. Experiments on public datasets show that the proposed method performs comparably well with the best existing methods. -We propose a conditional random filed (CRF)-based multi-oriented text line extraction method. We adopt a strategy which groups CCs first and then filter non-text CCs to avoid mis-filtering text CCs at the very beginning. A minimum spanning tree (MST) is first acquired by linking adjacent nodes. Then each edge is assigned a weight based on a coarse-to-fine scheme. The weight represents the belief that two nodes belong to the same line. Non-text and text nodes in the MST are identified with a CRF model for text/non-text CC classification. At last, lines are acquired trivially with the node labels and edge weights, and non-text lines are filtered based on text/non-text line classification. Experimental comparison with the local contrast-based segmentation method demonstrates the efficiency of the proposed method. -We propose a background rectangles analysis-based page segmentation method. Most existing methods only utilize foreground or background information. Instead, the proposed method considers both foreground and background information. Text lines and non-text CCs are first analyzed separately and then combined to acquire the segmentation results. As for text lines, background rectangles are first extracted from the gap between horizontally neighboring text CCs in the same text line. Then heuristic rules and MLP are adopted progressively to filter within-block rectangles. The remaining between-block rectangles are grouped into separators, which segment text regions into blocks. As for non-text CCs, small CCs and CCs overlapping with text blocks are filtered.
源URL	[http://ir.ia.ac.cn/handle/173211/11950]
专题	毕业生_博士学位论文
作者单位	中科院自动化研究所
推荐引用方式 GB/T 7714	陈凯. 网络图像中合成文本检测及版面分割方法研究[D]. 北京. 中国科学院研究生院. 2016.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。