A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application

Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps...

Full description

Saved in:
Bibliographic Details
Published inSymmetry (Basel) Vol. 15; no. 4; p. 849
Main Authors Liu, Yiyi, Wang, Yuxin, Shi, Hongjian
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.
AbstractList Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.
Audience Academic
Author Wang, Yuxin
Liu, Yiyi
Shi, Hongjian
Author_xml – sequence: 1
  givenname: Yiyi
  orcidid: 0000-0002-6749-9841
  surname: Liu
  fullname: Liu, Yiyi
– sequence: 2
  givenname: Yuxin
  orcidid: 0009-0000-5202-7350
  surname: Wang
  fullname: Wang, Yuxin
– sequence: 3
  givenname: Hongjian
  orcidid: 0000-0001-8732-4101
  surname: Shi
  fullname: Shi, Hongjian
BookMark eNptUE1PAjEQbQwmInLyD2zi0Sy22_1oj0j8SlATxfOmtFMsLi22uyr_3iIeiHHmMJOX995k3jHqWWcBoVOCR5RyfBE2K1LgHLOcH6B-hiuaMs7z3t5-hIYhLHGsAhd5iftIj5OJsx-u6VrjrGiSJ5Cd92Db5AE6L5r0AdpP59_SSxFAJfdCvhoLyRSEt8YuEu188iwhQjP4ardyt7Bma5aM1-vGSLHdT9ChFk2A4e8coJfrq9nkNp0-3txNxtNUUlq1acmqTFQEqMZKlFoRTRWuSsWLbE4UF1jPOaMk4qpSck5iYcwyYHOccQY5HaCzne_au_cOQlsvXefjX6HOGC5LwkpKImu0Yy1EA7Wx2rVeyNgKVkbGVLWJ-LgqMlJQXlZRcL4TSO9C8KDrtTcr4Tc1wfU2_Hov_Mgmf9jStD8xxDOm-VfzDVyliY0
CitedBy_id crossref_primary_10_3390_info14070369
crossref_primary_10_32628_CSEIT2410586
crossref_primary_10_26634_jit_13_3_21202
crossref_primary_10_3390_systems12050171
crossref_primary_10_1109_ACCESS_2024_3352748
crossref_primary_10_3390_app13179539
Cites_doi 10.1109/TPAMI.2016.2646371
10.1109/TPAMI.2013.182
10.1109/CVPR.2019.00956
10.1109/ICDAR.2015.7333942
10.1109/TIP.2013.2249082
10.1109/CVPR.2017.371
10.3115/v1/P14-1062
10.1109/CVPR.2017.283
10.1007/978-3-319-24571-3
10.1109/ICDAR.2017.157
10.1109/TPAMI.2014.2388210
10.5244/C.30.43
10.1109/ICIP.1996.560995
10.1109/CVPR.2016.451
10.1007/978-3-319-46454-1
10.1109/CVPR42600.2020.01213
ContentType Journal Article
Copyright COPYRIGHT 2023 MDPI AG
2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: COPYRIGHT 2023 MDPI AG
– notice: 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7SC
7SR
7U5
8BQ
8FD
8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
H8D
HCIFZ
JG9
JQ2
L6V
L7M
L~C
L~D
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.3390/sym15040849
DatabaseName CrossRef
Computer and Information Systems Abstracts
Engineered Materials Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni Edition)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
Aerospace Database
SciTech Collection (ProQuest)
Materials Research Database
ProQuest Computer Science Collection
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Engineering Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle CrossRef
Publicly Available Content Database
Materials Research Database
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
Aerospace Database
Engineered Materials Abstracts
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
Engineering Collection
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
METADEX
Computer and Information Systems Abstracts Professional
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
Solid State and Superconductivity Abstracts
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList CrossRef
Publicly Available Content Database

Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2073-8994
ExternalDocumentID A752153967
10_3390_sym15040849
GroupedDBID 5VS
8FE
8FG
AADQD
AAYXX
ABDBF
ABJCF
ACUHS
ADBBV
ADMLS
AFKRA
AFZYC
ALMA_UNASSIGNED_HOLDINGS
AMVHM
BCNDV
BENPR
BGLVJ
CCPQU
CITATION
E3Z
ESX
GX1
HCIFZ
IAO
ITC
J9A
KQ8
L6V
M7S
MODMG
M~E
OK1
PHGZM
PHGZT
PIMPY
PROAC
PTHSS
TR2
TUS
PMFND
7SC
7SR
7U5
8BQ
8FD
ABUWG
AZQEC
DWQXO
H8D
JG9
JQ2
L7M
L~C
L~D
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c337t-6872a71e3f0da6fd1f3d076d952b1d9a0fb9831d1fd7dcb11110082e8b0298e43
IEDL.DBID BENPR
ISSN 2073-8994
IngestDate Fri Jul 25 12:02:36 EDT 2025
Tue Jun 10 20:24:33 EDT 2025
Tue Jul 01 03:48:11 EDT 2025
Thu Apr 24 23:11:21 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c337t-6872a71e3f0da6fd1f3d076d952b1d9a0fb9831d1fd7dcb11110082e8b0298e43
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-8732-4101
0009-0000-5202-7350
0000-0002-6749-9841
OpenAccessLink https://www.proquest.com/docview/2806618631?pq-origsite=%requestingapplication%
PQID 2806618631
PQPubID 2032326
ParticipantIDs proquest_journals_2806618631
gale_infotracacademiconefile_A752153967
crossref_primary_10_3390_sym15040849
crossref_citationtrail_10_3390_sym15040849
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-04-01
PublicationDateYYYYMMDD 2023-04-01
PublicationDate_xml – month: 04
  year: 2023
  text: 2023-04-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Symmetry (Basel)
PublicationYear 2023
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References Poma (ref_18) 2022; 12
Yin (ref_21) 2014; 36
ref_14
ref_13
ref_12
ref_11
ref_10
Yin (ref_22) 2015; 37
ref_19
ref_17
ref_16
ref_15
Koo (ref_20) 2013; 22
Rahman (ref_4) 1996; Volume 3
ref_25
ref_24
ref_23
Shi (ref_1) 2017; 39
ref_3
ref_2
ref_26
ref_9
ref_8
ref_5
ref_7
ref_6
References_xml – volume: 39
  start-page: 2298
  year: 2017
  ident: ref_1
  article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2016.2646371
– volume: 12
  start-page: 223
  year: 2022
  ident: ref_18
  article-title: Adaptation of Number of Filters in the Convolution Layer of a Convolutional Neural Network Using the Fuzzy Gravitational Search Algorithm Method and Type-1 Fuzzy Logic
  publication-title: J. Artif. Intell. Soft Comput. Res.
– volume: 36
  start-page: 970
  year: 2014
  ident: ref_21
  article-title: Robust text detection in natural scene images
  publication-title: IEEE Trans. PAMI
  doi: 10.1109/TPAMI.2013.182
– ident: ref_3
– ident: ref_24
– ident: ref_10
  doi: 10.1109/CVPR.2019.00956
– ident: ref_11
  doi: 10.1109/ICDAR.2015.7333942
– ident: ref_14
– volume: 22
  start-page: 2296
  year: 2013
  ident: ref_20
  article-title: Scene text detection via connected component clustering and nontext filtering
  publication-title: IEEE Trans. Image Process.
  doi: 10.1109/TIP.2013.2249082
– ident: ref_5
  doi: 10.1109/CVPR.2017.371
– ident: ref_19
  doi: 10.1109/ICDAR.2015.7333942
– ident: ref_8
– ident: ref_2
  doi: 10.3115/v1/P14-1062
– ident: ref_25
– ident: ref_7
  doi: 10.1109/CVPR.2017.283
– ident: ref_9
  doi: 10.1007/978-3-319-24571-3
– ident: ref_12
– ident: ref_13
  doi: 10.1109/ICDAR.2017.157
– volume: 37
  start-page: 1930
  year: 2015
  ident: ref_22
  article-title: Multi-orientation scene text detection with adaptive clustering
  publication-title: IEEE Trans. PAMI
  doi: 10.1109/TPAMI.2014.2388210
– ident: ref_23
  doi: 10.5244/C.30.43
– volume: Volume 3
  start-page: 1003
  year: 1996
  ident: ref_4
  article-title: Multi-scale retinex for color image enhancement
  publication-title: Proceedings of the International Conference on Image Processing
  doi: 10.1109/ICIP.1996.560995
– ident: ref_15
  doi: 10.1109/CVPR.2016.451
– ident: ref_17
– ident: ref_6
  doi: 10.1007/978-3-319-46454-1
– ident: ref_26
  doi: 10.1109/CVPR42600.2020.01213
– ident: ref_16
  doi: 10.1007/978-3-319-46454-1
SSID ssj0000505460
Score 2.3134024
Snippet Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is...
SourceID proquest
gale
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
StartPage 849
SubjectTerms Accuracy
Algorithms
Artificial intelligence
Billboards
Computer vision
Datasets
Image enhancement
Image segmentation
Innovations
Machine learning
Machine vision
Methods
Model accuracy
Neural networks
Nonstationary environments
Optical character recognition
Pattern recognition
Recurrent neural networks
Retinex (algorithm)
Semantics
Title A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application
URI https://www.proquest.com/docview/2806618631
Volume 15
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8QwEA4-Ll7E9YHrixwEHxBsNm2SnmRXXEVwER-wt9IkjZe1Xe0q-O-d2WbdPYjHtpNQJpN5ZfINIcd54gWaHqZ44VgsOikzuU1ZEWvOvVY6jfE28v1A3r7Ed8NkGBJudSirnOnEqaJ2lcUc-QWeACK2u-CX43eGXaPwdDW00Fgmq6CCNQRfq73rwcPjb5YF-7TFMmou5gmI7y_q7zfwgeJII3rmgin6WyFPrUx_g6wH95B2m_VskaWi3CStsAFrehpQos-2iO_Sq6r8CpIDYx4xc45YSxQBN_IRGzQV3qwHhsrR-2nVZEEDoOorBW-VPlmYjz6DgsbhTSVRVdLu_FR7m7z0r5-vbllomsCsEGrCpFadHBgvfORy6R33wkVKujTpGO7SPPIm1YLDe6ecNagx0Q0otEEw9iIWO2SlrMpil1CbSqsgwFHOR7FPHFBBcJJwGXnujTJtcj7jX2YDojg2thhlEFkgs7MFZrfJ8S_xuAHS-JvsBBciw-0Fc9k83BKAP0KgqqyrwN9IRCpVmxzM1ioL-67O5lKy9__nfbKGjeObGpwDsjL5-CwOwb2YmCOyrPs3R0GS4OlmyH8AXKnSLw
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEB6l9NBeEPShptCyB6o-pBVe79prH6oqhaahkBzaIHEz9j56AQdIoOJP9Td2Jl6HHFBvuXofsnbeuzPfAOyWiZdkergWznIl45xXpcm5U5kQPtNZrqgaeThKByfqx2ly2oG_bS0MpVW2OnGuqO3E0B35Hr0AEra7FF8urzh1jaLX1baFRsMWR-7uD4Zs08-HB0jfd3Hc_zbeH_DQVYAbKfWMp5mOS_wz6SNbpt4KLy0G8zZP4krYvIx8lWdS4HerralIpZCddFlFaOVOSdz3ETxWUuYkUVn_--JOh7rCqTRqygBxPNqb3l2gx6WijLA6lwzfw-p_btP6G7AenFHWa7hnEzqufgabQdyn7EPApP74HHyP7U_q28CnuOYn3dMTshMjeI_ynI-afHL-Fc2iZcN5jqZjAb71N0PfmP0yuB8bozmg5U3e0qRmvfs39BdwspLDfAlr9aR2r4CZPDUawyltfaR8YnEWhkKJSCMvfKWrLnxqz68wAb-c2micFxjH0GEXS4fdhd3F5MsGtuPhae-JEAUJM-5lylCTgH9EsFhFT6N3kyChdRe2W1oVQcqnxT1Pvv7_8A48GYyHx8Xx4ehoC55Sy_om-2cb1mbXN-4NOjaz6u2cmxicrZp9_wFztQuV
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwEB7RRap6QTyKWEqLD1RtkayN4ySOD1W1PFY8ygrxkLiFxA8uNEu7C4i_1l_XmY0DHFBvXBPbssafPTP2zDcAG2XqJakeroSzPJGx5lVpNHdJLoTPVa4TykY-GmZ758nBRXoxA3_bXBgKq2zPxOlBbUeG7sh79AJI3O5S9HwIizjeGfy4-c2pghS9tLblNBqIHLqHe3Tfxt_3d3CtP8fxYPdse4-HCgPcSKkmPMtVXOIspY9smXkrvLTo2FudxpWwuox8pXMp8LtV1lR0vJDOdHlFzOUukTjuG5hV6BVFHZjd2h0enzze8FCNuCSLmqRAKXXUGz_8QvsriXJi7nymBl9WBlMNN5iHuWCasn6DpQWYcfUiLITNP2ZfA0P1tyXwfbY9qu8CarHPCd3aE88TI7KP8poPm-hyvoVK0rKjacSmY4HM9YqhpcxODY7HzlDC1L2JYhrVrP_0ov4ezl9FnMvQqUe1WwFmdGYUOlfK-ijxqcVW6BilIou88JWqurDZyq8wgc2cimpcF-jVkLCLZ8LuwsZj45uGxOPlZl9oIQra2jiWKUOGAs6ISLKKvkJbJ5U6U11Ya9eqCHt-XDwhdPX_v9fhLUK3-Lk_PPwA76h-fRMKtAadyZ9b9xGtnEn1KcCJweVrI_gfKsMRJw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Convolutional+Recurrent+Neural-Network-Based+Machine+Learning+for+Scene+Text+Recognition+Application&rft.jtitle=Symmetry+%28Basel%29&rft.au=Liu%2C+Yiyi&rft.au=Wang%2C+Yuxin&rft.au=Shi%2C+Hongjian&rft.date=2023-04-01&rft.pub=MDPI+AG&rft.issn=2073-8994&rft.eissn=2073-8994&rft.volume=15&rft.issue=4&rft_id=info:doi/10.3390%2Fsym15040849&rft.externalDocID=A752153967
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2073-8994&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2073-8994&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2073-8994&client=summon