A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application
Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps...
Saved in:
Published in | Symmetry (Basel) Vol. 15; no. 4; p. 849 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.04.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios. |
---|---|
AbstractList | Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios. |
Audience | Academic |
Author | Wang, Yuxin Liu, Yiyi Shi, Hongjian |
Author_xml | – sequence: 1 givenname: Yiyi orcidid: 0000-0002-6749-9841 surname: Liu fullname: Liu, Yiyi – sequence: 2 givenname: Yuxin orcidid: 0009-0000-5202-7350 surname: Wang fullname: Wang, Yuxin – sequence: 3 givenname: Hongjian orcidid: 0000-0001-8732-4101 surname: Shi fullname: Shi, Hongjian |
BookMark | eNptUE1PAjEQbQwmInLyD2zi0Sy22_1oj0j8SlATxfOmtFMsLi22uyr_3iIeiHHmMJOX995k3jHqWWcBoVOCR5RyfBE2K1LgHLOcH6B-hiuaMs7z3t5-hIYhLHGsAhd5iftIj5OJsx-u6VrjrGiSJ5Cd92Db5AE6L5r0AdpP59_SSxFAJfdCvhoLyRSEt8YuEu188iwhQjP4ardyt7Bma5aM1-vGSLHdT9ChFk2A4e8coJfrq9nkNp0-3txNxtNUUlq1acmqTFQEqMZKlFoRTRWuSsWLbE4UF1jPOaMk4qpSck5iYcwyYHOccQY5HaCzne_au_cOQlsvXefjX6HOGC5LwkpKImu0Yy1EA7Wx2rVeyNgKVkbGVLWJ-LgqMlJQXlZRcL4TSO9C8KDrtTcr4Tc1wfU2_Hov_Mgmf9jStD8xxDOm-VfzDVyliY0 |
CitedBy_id | crossref_primary_10_3390_info14070369 crossref_primary_10_32628_CSEIT2410586 crossref_primary_10_26634_jit_13_3_21202 crossref_primary_10_3390_systems12050171 crossref_primary_10_1109_ACCESS_2024_3352748 crossref_primary_10_3390_app13179539 |
Cites_doi | 10.1109/TPAMI.2016.2646371 10.1109/TPAMI.2013.182 10.1109/CVPR.2019.00956 10.1109/ICDAR.2015.7333942 10.1109/TIP.2013.2249082 10.1109/CVPR.2017.371 10.3115/v1/P14-1062 10.1109/CVPR.2017.283 10.1007/978-3-319-24571-3 10.1109/ICDAR.2017.157 10.1109/TPAMI.2014.2388210 10.5244/C.30.43 10.1109/ICIP.1996.560995 10.1109/CVPR.2016.451 10.1007/978-3-319-46454-1 10.1109/CVPR42600.2020.01213 |
ContentType | Journal Article |
Copyright | COPYRIGHT 2023 MDPI AG 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: COPYRIGHT 2023 MDPI AG – notice: 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 7SC 7SR 7U5 8BQ 8FD 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO H8D HCIFZ JG9 JQ2 L6V L7M L~C L~D M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
DOI | 10.3390/sym15040849 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Engineered Materials Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni Edition) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea Aerospace Database SciTech Collection (ProQuest) Materials Research Database ProQuest Computer Science Collection ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Engineering Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | CrossRef Publicly Available Content Database Materials Research Database Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Aerospace Database Engineered Materials Abstracts ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Engineering Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection METADEX Computer and Information Systems Abstracts Professional ProQuest One Academic UKI Edition Materials Science & Engineering Collection Solid State and Superconductivity Abstracts ProQuest One Academic ProQuest One Academic (New) |
DatabaseTitleList | CrossRef Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Sciences (General) |
EISSN | 2073-8994 |
ExternalDocumentID | A752153967 10_3390_sym15040849 |
GroupedDBID | 5VS 8FE 8FG AADQD AAYXX ABDBF ABJCF ACUHS ADBBV ADMLS AFKRA AFZYC ALMA_UNASSIGNED_HOLDINGS AMVHM BCNDV BENPR BGLVJ CCPQU CITATION E3Z ESX GX1 HCIFZ IAO ITC J9A KQ8 L6V M7S MODMG M~E OK1 PHGZM PHGZT PIMPY PROAC PTHSS TR2 TUS PMFND 7SC 7SR 7U5 8BQ 8FD ABUWG AZQEC DWQXO H8D JG9 JQ2 L7M L~C L~D PKEHL PQEST PQGLB PQQKQ PQUKI PRINS |
ID | FETCH-LOGICAL-c337t-6872a71e3f0da6fd1f3d076d952b1d9a0fb9831d1fd7dcb11110082e8b0298e43 |
IEDL.DBID | BENPR |
ISSN | 2073-8994 |
IngestDate | Fri Jul 25 12:02:36 EDT 2025 Tue Jun 10 20:24:33 EDT 2025 Tue Jul 01 03:48:11 EDT 2025 Thu Apr 24 23:11:21 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Language | English |
License | https://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c337t-6872a71e3f0da6fd1f3d076d952b1d9a0fb9831d1fd7dcb11110082e8b0298e43 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-8732-4101 0009-0000-5202-7350 0000-0002-6749-9841 |
OpenAccessLink | https://www.proquest.com/docview/2806618631?pq-origsite=%requestingapplication% |
PQID | 2806618631 |
PQPubID | 2032326 |
ParticipantIDs | proquest_journals_2806618631 gale_infotracacademiconefile_A752153967 crossref_primary_10_3390_sym15040849 crossref_citationtrail_10_3390_sym15040849 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2023-04-01 |
PublicationDateYYYYMMDD | 2023-04-01 |
PublicationDate_xml | – month: 04 year: 2023 text: 2023-04-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Basel |
PublicationPlace_xml | – name: Basel |
PublicationTitle | Symmetry (Basel) |
PublicationYear | 2023 |
Publisher | MDPI AG |
Publisher_xml | – name: MDPI AG |
References | Poma (ref_18) 2022; 12 Yin (ref_21) 2014; 36 ref_14 ref_13 ref_12 ref_11 ref_10 Yin (ref_22) 2015; 37 ref_19 ref_17 ref_16 ref_15 Koo (ref_20) 2013; 22 Rahman (ref_4) 1996; Volume 3 ref_25 ref_24 ref_23 Shi (ref_1) 2017; 39 ref_3 ref_2 ref_26 ref_9 ref_8 ref_5 ref_7 ref_6 |
References_xml | – volume: 39 start-page: 2298 year: 2017 ident: ref_1 article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2016.2646371 – volume: 12 start-page: 223 year: 2022 ident: ref_18 article-title: Adaptation of Number of Filters in the Convolution Layer of a Convolutional Neural Network Using the Fuzzy Gravitational Search Algorithm Method and Type-1 Fuzzy Logic publication-title: J. Artif. Intell. Soft Comput. Res. – volume: 36 start-page: 970 year: 2014 ident: ref_21 article-title: Robust text detection in natural scene images publication-title: IEEE Trans. PAMI doi: 10.1109/TPAMI.2013.182 – ident: ref_3 – ident: ref_24 – ident: ref_10 doi: 10.1109/CVPR.2019.00956 – ident: ref_11 doi: 10.1109/ICDAR.2015.7333942 – ident: ref_14 – volume: 22 start-page: 2296 year: 2013 ident: ref_20 article-title: Scene text detection via connected component clustering and nontext filtering publication-title: IEEE Trans. Image Process. doi: 10.1109/TIP.2013.2249082 – ident: ref_5 doi: 10.1109/CVPR.2017.371 – ident: ref_19 doi: 10.1109/ICDAR.2015.7333942 – ident: ref_8 – ident: ref_2 doi: 10.3115/v1/P14-1062 – ident: ref_25 – ident: ref_7 doi: 10.1109/CVPR.2017.283 – ident: ref_9 doi: 10.1007/978-3-319-24571-3 – ident: ref_12 – ident: ref_13 doi: 10.1109/ICDAR.2017.157 – volume: 37 start-page: 1930 year: 2015 ident: ref_22 article-title: Multi-orientation scene text detection with adaptive clustering publication-title: IEEE Trans. PAMI doi: 10.1109/TPAMI.2014.2388210 – ident: ref_23 doi: 10.5244/C.30.43 – volume: Volume 3 start-page: 1003 year: 1996 ident: ref_4 article-title: Multi-scale retinex for color image enhancement publication-title: Proceedings of the International Conference on Image Processing doi: 10.1109/ICIP.1996.560995 – ident: ref_15 doi: 10.1109/CVPR.2016.451 – ident: ref_17 – ident: ref_6 doi: 10.1007/978-3-319-46454-1 – ident: ref_26 doi: 10.1109/CVPR42600.2020.01213 – ident: ref_16 doi: 10.1007/978-3-319-46454-1 |
SSID | ssj0000505460 |
Score | 2.3134024 |
Snippet | Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is... |
SourceID | proquest gale crossref |
SourceType | Aggregation Database Enrichment Source Index Database |
StartPage | 849 |
SubjectTerms | Accuracy Algorithms Artificial intelligence Billboards Computer vision Datasets Image enhancement Image segmentation Innovations Machine learning Machine vision Methods Model accuracy Neural networks Nonstationary environments Optical character recognition Pattern recognition Recurrent neural networks Retinex (algorithm) Semantics |
Title | A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application |
URI | https://www.proquest.com/docview/2806618631 |
Volume | 15 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8QwEA4-Ll7E9YHrixwEHxBsNm2SnmRXXEVwER-wt9IkjZe1Xe0q-O-d2WbdPYjHtpNQJpN5ZfINIcd54gWaHqZ44VgsOikzuU1ZEWvOvVY6jfE28v1A3r7Ed8NkGBJudSirnOnEqaJ2lcUc-QWeACK2u-CX43eGXaPwdDW00Fgmq6CCNQRfq73rwcPjb5YF-7TFMmou5gmI7y_q7zfwgeJII3rmgin6WyFPrUx_g6wH95B2m_VskaWi3CStsAFrehpQos-2iO_Sq6r8CpIDYx4xc45YSxQBN_IRGzQV3qwHhsrR-2nVZEEDoOorBW-VPlmYjz6DgsbhTSVRVdLu_FR7m7z0r5-vbllomsCsEGrCpFadHBgvfORy6R33wkVKujTpGO7SPPIm1YLDe6ecNagx0Q0otEEw9iIWO2SlrMpil1CbSqsgwFHOR7FPHFBBcJJwGXnujTJtcj7jX2YDojg2thhlEFkgs7MFZrfJ8S_xuAHS-JvsBBciw-0Fc9k83BKAP0KgqqyrwN9IRCpVmxzM1ioL-67O5lKy9__nfbKGjeObGpwDsjL5-CwOwb2YmCOyrPs3R0GS4OlmyH8AXKnSLw |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEB6l9NBeEPShptCyB6o-pBVe79prH6oqhaahkBzaIHEz9j56AQdIoOJP9Td2Jl6HHFBvuXofsnbeuzPfAOyWiZdkergWznIl45xXpcm5U5kQPtNZrqgaeThKByfqx2ly2oG_bS0MpVW2OnGuqO3E0B35Hr0AEra7FF8urzh1jaLX1baFRsMWR-7uD4Zs08-HB0jfd3Hc_zbeH_DQVYAbKfWMp5mOS_wz6SNbpt4KLy0G8zZP4krYvIx8lWdS4HerralIpZCddFlFaOVOSdz3ETxWUuYkUVn_--JOh7rCqTRqygBxPNqb3l2gx6WijLA6lwzfw-p_btP6G7AenFHWa7hnEzqufgabQdyn7EPApP74HHyP7U_q28CnuOYn3dMTshMjeI_ynI-afHL-Fc2iZcN5jqZjAb71N0PfmP0yuB8bozmg5U3e0qRmvfs39BdwspLDfAlr9aR2r4CZPDUawyltfaR8YnEWhkKJSCMvfKWrLnxqz68wAb-c2micFxjH0GEXS4fdhd3F5MsGtuPhae-JEAUJM-5lylCTgH9EsFhFT6N3kyChdRe2W1oVQcqnxT1Pvv7_8A48GYyHx8Xx4ehoC55Sy_om-2cb1mbXN-4NOjaz6u2cmxicrZp9_wFztQuV |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwEB7RRap6QTyKWEqLD1RtkayN4ySOD1W1PFY8ygrxkLiFxA8uNEu7C4i_1l_XmY0DHFBvXBPbssafPTP2zDcAG2XqJakeroSzPJGx5lVpNHdJLoTPVa4TykY-GmZ758nBRXoxA3_bXBgKq2zPxOlBbUeG7sh79AJI3O5S9HwIizjeGfy4-c2pghS9tLblNBqIHLqHe3Tfxt_3d3CtP8fxYPdse4-HCgPcSKkmPMtVXOIspY9smXkrvLTo2FudxpWwuox8pXMp8LtV1lR0vJDOdHlFzOUukTjuG5hV6BVFHZjd2h0enzze8FCNuCSLmqRAKXXUGz_8QvsriXJi7nymBl9WBlMNN5iHuWCasn6DpQWYcfUiLITNP2ZfA0P1tyXwfbY9qu8CarHPCd3aE88TI7KP8poPm-hyvoVK0rKjacSmY4HM9YqhpcxODY7HzlDC1L2JYhrVrP_0ov4ezl9FnMvQqUe1WwFmdGYUOlfK-ijxqcVW6BilIou88JWqurDZyq8wgc2cimpcF-jVkLCLZ8LuwsZj45uGxOPlZl9oIQra2jiWKUOGAs6ISLKKvkJbJ5U6U11Ya9eqCHt-XDwhdPX_v9fhLUK3-Lk_PPwA76h-fRMKtAadyZ9b9xGtnEn1KcCJweVrI_gfKsMRJw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Convolutional+Recurrent+Neural-Network-Based+Machine+Learning+for+Scene+Text+Recognition+Application&rft.jtitle=Symmetry+%28Basel%29&rft.au=Liu%2C+Yiyi&rft.au=Wang%2C+Yuxin&rft.au=Shi%2C+Hongjian&rft.date=2023-04-01&rft.pub=MDPI+AG&rft.issn=2073-8994&rft.eissn=2073-8994&rft.volume=15&rft.issue=4&rft_id=info:doi/10.3390%2Fsym15040849&rft.externalDocID=A752153967 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2073-8994&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2073-8994&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2073-8994&client=summon |