中国科学院机构知识库网格系统: An end-to-end model for multi-view scene text recognition

An end-to-end model for multi-view scene text recognition

文献类型：期刊论文


作者	Banerjee, Ayan 1; Shivakumara, Palaiahnakote 2; Bhattacharya, Saumik 3; Pal, Umapada 1; Liu, Cheng-Lin4
刊名	PATTERN RECOGNITION
出版日期	2024-05-01
卷号	149 页码:17
关键词	Text detection Scene text recognition Siamese network Natural language model Genetic algorithm Multi-view text detection
ISSN号	0031-3203
DOI	10.1016/j.patcog.2023.110206
通讯作者	Shivakumara, Palaiahnakote(S.Palaiahnakote@salford.ac.uk)
英文摘要	Due to the increasing applications of surveillance and monitoring such as person re-identification, vehicle reidentification and sports events tracking, the necessity of text detection and end-to-end recognition is also growing. Although the past deep learning-based models have addressed several challenges such as arbitraryshaped text, multiple scripts, and variations in the geometric structure of characters, the scope of the models is limited to a single view. This paper presents an end-to-end model for text recognition through refining the multi-views of the same scene, which is called E2EMVSTR (End-to-End Model for Multi-View Scene Text Recognition). Considering the common characteristics shared in multi-view texts, we propose a cycle consistency pairwise similarity-based deep learning model to find texts more efficiently in three input views. Further, the extracted texts are supplied to a Siamese network and semi-supervised attention embedding combinational network for obtaining recognition results. The proposed model combines natural language processing and genetic algorithm models to restore missing character information and correct wrong recognition results. In experiments on our multi-view dataset and several benchmark datasets, the proposed method is proven effective compared to the state-of-the-art methods. The dataset and codes will be made available to the public upon acceptance.
WOS关键词	ATTENTION NETWORK ; IMAGES
资助项目	Ministry of Higher Education of Malaysia[FRGS/1/2020/ICT02/UM/02/4]
WOS研究方向	Computer Science ; Engineering
语种	英语
WOS记录号	WOS:001166069400001
出版者	ELSEVIER SCI LTD
资助机构	Ministry of Higher Education of Malaysia
源URL	[http://ir.ia.ac.cn/handle/173211/57833]
专题	自动化研究所_模式识别国家重点实验室_模式分析与学习团队
通讯作者	Shivakumara, Palaiahnakote
作者单位	1.Indian Stat Inst, Kolkata, India 2.Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia 3.Indian Inst Technol Kharagpur, Dept E&ECE, Kharagpur, W Bengal, India 4.Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
推荐引用方式 GB/T 7714	Banerjee, Ayan,Shivakumara, Palaiahnakote,Bhattacharya, Saumik,et al. An end-to-end model for multi-view scene text recognition[J]. PATTERN RECOGNITION,2024,149:17.
APA	Banerjee, Ayan,Shivakumara, Palaiahnakote,Bhattacharya, Saumik,Pal, Umapada,&Liu, Cheng-Lin.(2024).An end-to-end model for multi-view scene text recognition.PATTERN RECOGNITION,149,17.
MLA	Banerjee, Ayan,et al."An end-to-end model for multi-view scene text recognition".PATTERN RECOGNITION 149(2024):17.

入库方式： OAI收割

来源：自动化研究所

下载0

An end-to-end model for multi-view scene text recognition

其他版本