网络图像中合成文本检测及版面分割方法研究
文献类型:学位论文
作者 | 陈凯![]() |
学位类别 | 工学博士 |
答辩日期 | 2016-06 |
授予单位 | 中国科学院研究生院 |
授予地点 | 北京 |
导师 | 刘成林 |
关键词 | 合成文本检测 局部对比度分割 多方向文本行提取 条件随机场 版面分割 背景矩形分析 |
中文摘要 | 随着互联网、智能手机和通信技术的迅速发展,互联网(包括移动互联网)上多媒体数据快速增长。文字作为一种普遍使用的交流工具,经常被人工添加到图像中以传递信息,并在网络(例如:微博、微信和购物网站等)上广泛传播。因此,图像文本内容的识别与理解对于有效地利用网络信息具有重要意义,并受到了学术界和工业界的广泛重视。
图像文本识别系统包含文本检测、版面分割和文本识别。由于网络图像背景复杂、颜色多变、多语言混杂、图文混合并且版面复杂,文本检测和版面分割面临一系列技术挑战。本文结合图像处理、模式识别和概率图模型等相关领域的技术,对网络图像中的合成文本检测与版面分割进行了深入的研究。相比现有方法,本文所提出的方法在精度、召回率等方面具有一定的优势。本文主要研究工作和贡献如下: |
英文摘要 | The multimedia data, including texts, image and video, is increasing rapidly on the Internet and mobile network. Images with embedded texts are in considerable proportion in network media data. Therefore, reading the texts will help to better understand the image contents. However, the automatic text reading in born-digital images is still a challenging task and has inspired great interests in both academia and industry. A text information extraction system consists of three parts: text detection, page segmentation and text recognition. Born-digital text detection and page segmentation face a series of challenges due to cluttered background, variations of color, multilingual texts, mixed texts and graphics, and complex layouts. In this thesis, we present an in-depth study on born-digital text detection and page segmentation by combining techniques in image processing, pattern recognition and probabilistic graphical model. Experimental results on several public datasets demonstrate the effectiveness and superiority of the proposed methods. The contributions of this dissertation are summarized as follows: - We propose a born-digital text detection method by local contrast-based segmentation, which takes full advantage of the characteristics of born-digital text in web images. The proposed method first extracts candidate text connected components (CCs), then applies text/non-text CC classification to filter non-text CCs. Subsequently, text CCs are then grouped into text lines based on heuristic rules. At last, non-viable text lines are filtered by text line verification. We detect text contours and stroke interior regions separately and combine them to extract candidate text CCs. First the image is segmented into non-smooth and smooth regions based on local contrast thresholding. Text contour pixels and non-text contour pixels in non-smooth regions are detached using local binarization. Fortunately, stroke interior regions correspond to smooth regions directly. Experiments on public datasets show that the proposed method performs comparably well with the best existing methods. -We propose a conditional random filed (CRF)-based multi-oriented text line extraction method. We adopt a strategy which groups CCs first and then filter non-text CCs to avoid mis-filtering text CCs at the very beginning. A minimum spanning tree (MST) is first acquired by linking adjacent nodes. Then each edge is assigned a weight based on a coarse-to-fine scheme. The weight represents the belief that two nodes belong to the same line. Non-text and text nodes in the MST are identified with a CRF model for text/non-text CC classification. At last, lines are acquired trivially with the node labels and edge weights, and non-text lines are filtered based on text/non-text line classification. Experimental comparison with the local contrast-based segmentation method demonstrates the efficiency of the proposed method.
-We propose a background rectangles analysis-based page segmentation method. Most existing methods only utilize foreground or background information. Instead, the proposed method considers both foreground and background information. Text lines and non-text CCs are first analyzed separately and then combined to acquire the segmentation results. As for text lines, background rectangles are first extracted from the gap between horizontally neighboring text CCs in the same text line. Then heuristic rules and MLP are adopted progressively to filter within-block rectangles. The remaining between-block rectangles are grouped into separators, which segment text regions into blocks. As for non-text CCs, small CCs and CCs overlapping with text blocks are filtered. |
源URL | [http://ir.ia.ac.cn/handle/173211/11950] ![]() |
专题 | 毕业生_博士学位论文 |
作者单位 | 中科院自动化研究所 |
推荐引用方式 GB/T 7714 | 陈凯. 网络图像中合成文本检测及版面分割方法研究[D]. 北京. 中国科学院研究生院. 2016. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。