A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application

Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps...

Full description

Saved in:

Bibliographic Details
Published in	Symmetry (Basel) Vol. 15; no. 4; p. 849
Main Authors	Liu, Yiyi, Wang, Yuxin, Shi, Hongjian
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.04.2023
Subjects	Accuracy Algorithms Artificial intelligence Billboards Computer vision Datasets Image enhancement Image segmentation Innovations Machine learning Machine vision Methods Model accuracy Neural networks Nonstationary environments Optical character recognition Pattern recognition Recurrent neural networks Retinex (algorithm) Semantics
Online Access	Get full text

Cover

Loading…

Abstract	Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.
AbstractList	Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.
Audience	Academic
Author	Wang, Yuxin Liu, Yiyi Shi, Hongjian
Author_xml	– sequence: 1 givenname: Yiyi orcidid: 0000-0002-6749-9841 surname: Liu fullname: Liu, Yiyi – sequence: 2 givenname: Yuxin orcidid: 0009-0000-5202-7350 surname: Wang fullname: Wang, Yuxin – sequence: 3 givenname: Hongjian orcidid: 0000-0001-8732-4101 surname: Shi fullname: Shi, Hongjian
BookMark	eNptUE1PAjEQbQwmInLyD2zi0Sy22_1oj0j8SlATxfOmtFMsLi22uyr_3iIeiHHmMJOX995k3jHqWWcBoVOCR5RyfBE2K1LgHLOcH6B-hiuaMs7z3t5-hIYhLHGsAhd5iftIj5OJsx-u6VrjrGiSJ5Cd92Db5AE6L5r0AdpP59_SSxFAJfdCvhoLyRSEt8YuEu188iwhQjP4ardyt7Bma5aM1-vGSLHdT9ChFk2A4e8coJfrq9nkNp0-3txNxtNUUlq1acmqTFQEqMZKlFoRTRWuSsWLbE4UF1jPOaMk4qpSck5iYcwyYHOccQY5HaCzne_au_cOQlsvXefjX6HOGC5LwkpKImu0Yy1EA7Wx2rVeyNgKVkbGVLWJ-LgqMlJQXlZRcL4TSO9C8KDrtTcr4Tc1wfU2_Hov_Mgmf9jStD8xxDOm-VfzDVyliY0
CitedBy_id	crossref_primary_10_3390_info14070369 crossref_primary_10_32628_CSEIT2410586 crossref_primary_10_26634_jit_13_3_21202 crossref_primary_10_3390_systems12050171 crossref_primary_10_1109_ACCESS_2024_3352748 crossref_primary_10_3390_app13179539
Cites_doi	10.1109/TPAMI.2016.2646371 10.1109/TPAMI.2013.182 10.1109/CVPR.2019.00956 10.1109/ICDAR.2015.7333942 10.1109/TIP.2013.2249082 10.1109/CVPR.2017.371 10.3115/v1/P14-1062 10.1109/CVPR.2017.283 10.1007/978-3-319-24571-3 10.1109/ICDAR.2017.157 10.1109/TPAMI.2014.2388210 10.5244/C.30.43 10.1109/ICIP.1996.560995 10.1109/CVPR.2016.451 10.1007/978-3-319-46454-1 10.1109/CVPR42600.2020.01213
ContentType	Journal Article
Copyright	COPYRIGHT 2023 MDPI AG 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: COPYRIGHT 2023 MDPI AG – notice: 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	AAYXX CITATION 7SC 7SR 7U5 8BQ 8FD 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO H8D HCIFZ JG9 JQ2 L6V L7M L~C L~D M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS
DOI	10.3390/sym15040849
DatabaseName	CrossRef Computer and Information Systems Abstracts Engineered Materials Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni Edition) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea Aerospace Database SciTech Collection (ProQuest) Materials Research Database ProQuest Computer Science Collection ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Engineering Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection
DatabaseTitle	CrossRef Publicly Available Content Database Materials Research Database Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Aerospace Database Engineered Materials Abstracts ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Engineering Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection METADEX Computer and Information Systems Abstracts Professional ProQuest One Academic UKI Edition Materials Science & Engineering Collection Solid State and Superconductivity Abstracts ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	CrossRef Publicly Available Content Database
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Sciences (General)
EISSN	2073-8994
ExternalDocumentID	A752153967 10_3390_sym15040849
GroupedDBID	5VS 8FE 8FG AADQD AAYXX ABDBF ABJCF ACUHS ADBBV ADMLS AFKRA AFZYC ALMA_UNASSIGNED_HOLDINGS AMVHM BCNDV BENPR BGLVJ CCPQU CITATION E3Z ESX GX1 HCIFZ IAO ITC J9A KQ8 L6V M7S MODMG M~E OK1 PHGZM PHGZT PIMPY PROAC PTHSS TR2 TUS PMFND 7SC 7SR 7U5 8BQ 8FD ABUWG AZQEC DWQXO H8D JG9 JQ2 L7M L~C L~D PKEHL PQEST PQGLB PQQKQ PQUKI PRINS
ID	FETCH-LOGICAL-c337t-6872a71e3f0da6fd1f3d076d952b1d9a0fb9831d1fd7dcb11110082e8b0298e43
IEDL.DBID	BENPR
ISSN	2073-8994
IngestDate	Fri Jul 25 12:02:36 EDT 2025 Tue Jun 10 20:24:33 EDT 2025 Tue Jul 01 03:48:11 EDT 2025 Thu Apr 24 23:11:21 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
License	https://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c337t-6872a71e3f0da6fd1f3d076d952b1d9a0fb9831d1fd7dcb11110082e8b0298e43
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-8732-4101 0009-0000-5202-7350 0000-0002-6749-9841
OpenAccessLink	https://www.proquest.com/docview/2806618631?pq-origsite=%requestingapplication%
PQID	2806618631
PQPubID	2032326
ParticipantIDs	proquest_journals_2806618631 gale_infotracacademiconefile_A752153967 crossref_primary_10_3390_sym15040849 crossref_citationtrail_10_3390_sym15040849
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-04-01
PublicationDateYYYYMMDD	2023-04-01
PublicationDate_xml	– month: 04 year: 2023 text: 2023-04-01 day: 01
PublicationDecade	2020
PublicationPlace	Basel
PublicationPlace_xml	– name: Basel
PublicationTitle	Symmetry (Basel)
PublicationYear	2023
Publisher	MDPI AG
Publisher_xml	– name: MDPI AG
References	Poma (ref_18) 2022; 12 Yin (ref_21) 2014; 36 ref_14 ref_13 ref_12 ref_11 ref_10 Yin (ref_22) 2015; 37 ref_19 ref_17 ref_16 ref_15 Koo (ref_20) 2013; 22 Rahman (ref_4) 1996; Volume 3 ref_25 ref_24 ref_23 Shi (ref_1) 2017; 39 ref_3 ref_2 ref_26 ref_9 ref_8 ref_5 ref_7 ref_6
References_xml	– volume: 39 start-page: 2298 year: 2017 ident: ref_1 article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2016.2646371 – volume: 12 start-page: 223 year: 2022 ident: ref_18 article-title: Adaptation of Number of Filters in the Convolution Layer of a Convolutional Neural Network Using the Fuzzy Gravitational Search Algorithm Method and Type-1 Fuzzy Logic publication-title: J. Artif. Intell. Soft Comput. Res. – volume: 36 start-page: 970 year: 2014 ident: ref_21 article-title: Robust text detection in natural scene images publication-title: IEEE Trans. PAMI doi: 10.1109/TPAMI.2013.182 – ident: ref_3 – ident: ref_24 – ident: ref_10 doi: 10.1109/CVPR.2019.00956 – ident: ref_11 doi: 10.1109/ICDAR.2015.7333942 – ident: ref_14 – volume: 22 start-page: 2296 year: 2013 ident: ref_20 article-title: Scene text detection via connected component clustering and nontext filtering publication-title: IEEE Trans. Image Process. doi: 10.1109/TIP.2013.2249082 – ident: ref_5 doi: 10.1109/CVPR.2017.371 – ident: ref_19 doi: 10.1109/ICDAR.2015.7333942 – ident: ref_8 – ident: ref_2 doi: 10.3115/v1/P14-1062 – ident: ref_25 – ident: ref_7 doi: 10.1109/CVPR.2017.283 – ident: ref_9 doi: 10.1007/978-3-319-24571-3 – ident: ref_12 – ident: ref_13 doi: 10.1109/ICDAR.2017.157 – volume: 37 start-page: 1930 year: 2015 ident: ref_22 article-title: Multi-orientation scene text detection with adaptive clustering publication-title: IEEE Trans. PAMI doi: 10.1109/TPAMI.2014.2388210 – ident: ref_23 doi: 10.5244/C.30.43 – volume: Volume 3 start-page: 1003 year: 1996 ident: ref_4 article-title: Multi-scale retinex for color image enhancement publication-title: Proceedings of the International Conference on Image Processing doi: 10.1109/ICIP.1996.560995 – ident: ref_15 doi: 10.1109/CVPR.2016.451 – ident: ref_17 – ident: ref_6 doi: 10.1007/978-3-319-46454-1 – ident: ref_26 doi: 10.1109/CVPR42600.2020.01213 – ident: ref_16 doi: 10.1007/978-3-319-46454-1
SSID	ssj0000505460
Score	2.3134024
Snippet	Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is...
SourceID	proquest gale crossref
SourceType	Aggregation Database Enrichment Source Index Database
StartPage	849
SubjectTerms	Accuracy Algorithms Artificial intelligence Billboards Computer vision Datasets Image enhancement Image segmentation Innovations Machine learning Machine vision Methods Model accuracy Neural networks Nonstationary environments Optical character recognition Pattern recognition Recurrent neural networks Retinex (algorithm) Semantics
Title	A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application
URI	https://www.proquest.com/docview/2806618631
Volume	15
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8QwEA4-Ll7E9YHrixwEHxBsNm2SnmRXXEVwER-wt9IkjZe1Xe0q-O-d2WbdPYjHtpNQJpN5ZfINIcd54gWaHqZ44VgsOikzuU1ZEWvOvVY6jfE28v1A3r7Ed8NkGBJudSirnOnEqaJ2lcUc-QWeACK2u-CX43eGXaPwdDW00Fgmq6CCNQRfq73rwcPjb5YF-7TFMmou5gmI7y_q7zfwgeJII3rmgin6WyFPrUx_g6wH95B2m_VskaWi3CStsAFrehpQos-2iO_Sq6r8CpIDYx4xc45YSxQBN_IRGzQV3qwHhsrR-2nVZEEDoOorBW-VPlmYjz6DgsbhTSVRVdLu_FR7m7z0r5-vbllomsCsEGrCpFadHBgvfORy6R33wkVKujTpGO7SPPIm1YLDe6ecNagx0Q0otEEw9iIWO2SlrMpil1CbSqsgwFHOR7FPHFBBcJJwGXnujTJtcj7jX2YDojg2thhlEFkgs7MFZrfJ8S_xuAHS-JvsBBciw-0Fc9k83BKAP0KgqqyrwN9IRCpVmxzM1ioL-67O5lKy9__nfbKGjeObGpwDsjL5-CwOwb2YmCOyrPs3R0GS4OlmyH8AXKnSLw
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEB6l9NBeEPShptCyB6o-pBVe79prH6oqhaahkBzaIHEz9j56AQdIoOJP9Td2Jl6HHFBvuXofsnbeuzPfAOyWiZdkergWznIl45xXpcm5U5kQPtNZrqgaeThKByfqx2ly2oG_bS0MpVW2OnGuqO3E0B35Hr0AEra7FF8urzh1jaLX1baFRsMWR-7uD4Zs08-HB0jfd3Hc_zbeH_DQVYAbKfWMp5mOS_wz6SNbpt4KLy0G8zZP4krYvIx8lWdS4HerralIpZCddFlFaOVOSdz3ETxWUuYkUVn_--JOh7rCqTRqygBxPNqb3l2gx6WijLA6lwzfw-p_btP6G7AenFHWa7hnEzqufgabQdyn7EPApP74HHyP7U_q28CnuOYn3dMTshMjeI_ynI-afHL-Fc2iZcN5jqZjAb71N0PfmP0yuB8bozmg5U3e0qRmvfs39BdwspLDfAlr9aR2r4CZPDUawyltfaR8YnEWhkKJSCMvfKWrLnxqz68wAb-c2micFxjH0GEXS4fdhd3F5MsGtuPhae-JEAUJM-5lylCTgH9EsFhFT6N3kyChdRe2W1oVQcqnxT1Pvv7_8A48GYyHx8Xx4ehoC55Sy_om-2cb1mbXN-4NOjaz6u2cmxicrZp9_wFztQuV
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwEB7RRap6QTyKWEqLD1RtkayN4ySOD1W1PFY8ygrxkLiFxA8uNEu7C4i_1l_XmY0DHFBvXBPbssafPTP2zDcAG2XqJakeroSzPJGx5lVpNHdJLoTPVa4TykY-GmZ758nBRXoxA3_bXBgKq2zPxOlBbUeG7sh79AJI3O5S9HwIizjeGfy4-c2pghS9tLblNBqIHLqHe3Tfxt_3d3CtP8fxYPdse4-HCgPcSKkmPMtVXOIspY9smXkrvLTo2FudxpWwuox8pXMp8LtV1lR0vJDOdHlFzOUukTjuG5hV6BVFHZjd2h0enzze8FCNuCSLmqRAKXXUGz_8QvsriXJi7nymBl9WBlMNN5iHuWCasn6DpQWYcfUiLITNP2ZfA0P1tyXwfbY9qu8CarHPCd3aE88TI7KP8poPm-hyvoVK0rKjacSmY4HM9YqhpcxODY7HzlDC1L2JYhrVrP_0ov4ezl9FnMvQqUe1WwFmdGYUOlfK-ijxqcVW6BilIou88JWqurDZyq8wgc2cimpcF-jVkLCLZ8LuwsZj45uGxOPlZl9oIQra2jiWKUOGAs6ISLKKvkJbJ5U6U11Ya9eqCHt-XDwhdPX_v9fhLUK3-Lk_PPwA76h-fRMKtAadyZ9b9xGtnEn1KcCJweVrI_gfKsMRJw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Convolutional+Recurrent+Neural-Network-Based+Machine+Learning+for+Scene+Text+Recognition+Application&rft.jtitle=Symmetry+%28Basel%29&rft.au=Liu%2C+Yiyi&rft.au=Wang%2C+Yuxin&rft.au=Shi%2C+Hongjian&rft.date=2023-04-01&rft.pub=MDPI+AG&rft.issn=2073-8994&rft.eissn=2073-8994&rft.volume=15&rft.issue=4&rft_id=info:doi/10.3390%2Fsym15040849&rft.externalDocID=A752153967
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2073-8994&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2073-8994&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2073-8994&client=summon