A holistic representation guided attention network for scene text recognition

Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-trai...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 414; pp. 67 - 75
Main Authors	Yang, Lu, Wang, Peng, Li, Hui, Li, Zhen, Zhang, Yanning
Format	Journal Article
Language	English
Published	Elsevier B.V 13.11.2020
Subjects	Convolutional-Attention Holistic Representation Scene Text Recognition Transformer Transformer Holistic Representation Scene Text Recognition Convolutional-Attention
Online Access	Get full text

Cover

Loading…

Abstract	Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.
AbstractList	Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.
Author	Wang, Peng Li, Hui Li, Zhen Yang, Lu Zhang, Yanning
Author_xml	– sequence: 1 givenname: Lu surname: Yang fullname: Yang, Lu email: lu.yang@mail.nwpu.edu.cn organization: School of Computer Science, Northwestern Polytechnical University, Xi’an, China – sequence: 2 givenname: Peng surname: Wang fullname: Wang, Peng email: peng.wang@nwpu.edu.cn organization: School of Computer Science, Northwestern Polytechnical University, Xi’an, China – sequence: 3 givenname: Hui surname: Li fullname: Li, Hui email: huili03855@gmail.com organization: School of Computer Science, The University of Adelaide, Australia – sequence: 4 givenname: Zhen surname: Li fullname: Li, Zhen email: lizhen@mskj.com organization: MinSheng FinTech Corp. Ltd., China – sequence: 5 givenname: Yanning surname: Zhang fullname: Zhang, Yanning email: ynzhang@nwpu.edu.cn organization: School of Computer Science, Northwestern Polytechnical University, Xi’an, China
BookMark	eNqFkMtOwzAQRS1UJErhD1jkBxLsyZsFUlXxkorYwNpKx-Pi0tqV7fL4exKFFQtYjXQ152rmnLKJdZYYuxA8E1xUl5vM0gHdLgMOPON1xgU_YlPR1JA20FQTNuUtlCnkAk7YaQgbzkUtoJ2yx3ny6rYmRIOJp72nQDZ20TibrA9GkUq6GPtoCCzFD-ffEu18EpAsJZE-Y4-hW1szrJyxY91tA53_zBl7ub15Xtyny6e7h8V8mWJeQ0yBdNGQqFosihK0LmFVi6rAlVLUIi-pUqg11zrHFRAC1WWTF1jwsgFF2OQzdjX2oncheNISzXh19J3ZSsHlIEZu5ChGDmIkr2UvpoeLX_Dem13nv_7DrkeM-sfeDXkZ0JBFUqY3EKVy5u-Cb_g9hDA
CitedBy_id	crossref_primary_10_1007_s13735_022_00253_6 crossref_primary_10_11834_jig_221049 crossref_primary_10_1038_s41598_022_14434_0 crossref_primary_10_1007_s40747_022_00916_1 crossref_primary_10_1109_TSMC_2023_3319964 crossref_primary_10_1109_TPAMI_2021_3132034 crossref_primary_10_1007_s10032_022_00398_4 crossref_primary_10_1007_s13369_021_06311_1 crossref_primary_10_1016_j_patcog_2021_107980 crossref_primary_10_1117_1_JEI_32_2_023015 crossref_primary_10_1142_S021800142353004X crossref_primary_10_1155_2021_6658842 crossref_primary_10_1007_s10489_022_04241_5 crossref_primary_10_3390_s24092791 crossref_primary_10_1145_3625822 crossref_primary_10_1016_j_eswa_2023_122769 crossref_primary_10_1109_TCSVT_2022_3146240 crossref_primary_10_1109_TPAMI_2022_3230962 crossref_primary_10_1007_s10489_021_02219_3 crossref_primary_10_1016_j_knosys_2023_111178 crossref_primary_10_1109_ACCESS_2022_3207469 crossref_primary_10_1155_2022_2206917 crossref_primary_10_1016_j_asoc_2023_110969 crossref_primary_10_1007_s10489_021_03119_2
Cites_doi	10.1109/CVPR.2019.00216 10.1609/aaai.v32i1.12246 10.1016/j.eswa.2014.07.008 10.1109/CVPR.2016.452 10.1109/ACCESS.2018.2878899 10.1109/ICCV.2011.6126402 10.1109/ICASSP.2018.8462506 10.5244/C.30.43 10.1609/aaai.v32i1.12252 10.1109/CVPR.2018.00163 10.1109/CVPR.2018.00380 10.1109/TMM.2018.2802644 10.1016/j.neucom.2019.01.094 10.1007/s11263-015-0823-z 10.1109/ICDAR.2017.233 10.1109/ICCV.2013.76 10.1109/ICCV.2019.00924 10.5244/C.26.127 10.1109/ICCV.2019.00481 10.1109/CVPR.2016.245 10.1016/j.patcog.2016.10.016 10.1109/ICCV.2017.560 10.1109/TPAMI.2016.2646371 10.1016/j.patcog.2015.07.009 10.1109/ICDAR.2013.221 10.1109/CVPR.2016.90 10.24963/ijcai.2017/458 10.1109/CVPR.2018.00813 10.1109/ICCV.2017.543 10.1109/CVPR.2016.254 10.1007/978-3-030-01264-9_43 10.1109/TIP.2017.2707805 10.1109/CVPR.2018.00584 10.1016/j.patcog.2019.01.020
ContentType	Journal Article
Copyright	2020 Elsevier B.V.
Copyright_xml	– notice: 2020 Elsevier B.V.
DBID	AAYXX CITATION
DOI	10.1016/j.neucom.2020.07.010
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1872-8286
EndPage	75
ExternalDocumentID	10_1016_j_neucom_2020_07_010 S0925231220311176
GroupedDBID	--- --K --M .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXLA AAXUO AAYFN ABBOA ABCQJ ABFNM ABJNI ABMAC ABYKQ ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W KOM LG9 M41 MO0 MOBAO N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSN SSV SSZ T5K ZMT ~G- 29N AAQXK AATTM AAXKI AAYWO AAYXX ABWVN ABXDB ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN BNPGV CITATION EJD FEDTE FGOYB HLZ HVGLF HZ~ R2- RIG SBC SEW SSH WUQ XPP
ID	FETCH-LOGICAL-c372t-2ef48e169c4452ff52b7164cbdde9c05e6dcff0ff3cb2ec2e75834c40582dec83
IEDL.DBID	.~1
ISSN	0925-2312
IngestDate	Tue Jul 01 01:46:51 EDT 2025 Thu Apr 24 23:03:13 EDT 2025 Fri Feb 23 02:47:41 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Transformer Holistic Representation Scene Text Recognition Convolutional-Attention
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c372t-2ef48e169c4452ff52b7164cbdde9c05e6dcff0ff3cb2ec2e75834c40582dec83
PageCount	9
ParticipantIDs	crossref_citationtrail_10_1016_j_neucom_2020_07_010 crossref_primary_10_1016_j_neucom_2020_07_010 elsevier_sciencedirect_doi_10_1016_j_neucom_2020_07_010
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2020-11-13
PublicationDateYYYYMMDD	2020-11-13
PublicationDate_xml	– month: 11 year: 2020 text: 2020-11-13 day: 13
PublicationDecade	2020
PublicationTitle	Neurocomputing (Amsterdam)
PublicationYear	2020
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Shi, Yang, Wang, Lyu, Yao, Bai (b0035) 2018; 1–1 F. Zhan, S. Lu, ESIR: End-to-end scene text recognition via iterative rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016. Shi, Bai, Yao (b0110) 2017; 39 A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016, pp. 2315–2324. Y. Gao, Y. Chen, J. Wang, H. Lu, Reading scene text with attention convolutional sequence modeling, Neurocomputing. 339 (2019) 161–170. A. Mishra, K. Alahari, C.V. Jawahar, Scene text recognition using higher order language priors, in: Proc. British Mach. Vis. Conf., 2012, pp. 1–11. S. Karaoglu, R. Tao, T. Gevers, A.W. Smeulders, Words matter: Scene text for image classification and retrieval 19(5) (2017) 1063–1076. Tian, Bhattacharya, Lu, Su, Wang, Wei, Lu, Tan (b0125) 2016; 51 B. Shi, C. Yao, M. Liao, M. Yang, P. Xu, L. Cui, S. Belongie, S. Lu, X. Bai, Icdar 2017 competition on reading chinese text in the wild (rctw-17) (2017) 1429–1437. Z. Liu, Y. Li, F. Ren, W. L. Goh, H. Yu, SqueezedText: a real-time scene text recognition by binary convolutional encoder-decoder network, in: Proc. AAAI Conf. Artificial Intell., 2018. Su, Lu (b0135) 2017; 63 M.D. Zeiler, ADADELTA: an adaptive learning rate method, arXiv preprint arXiv:1212.5701. D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, E. Valveny, ICDAR 2015 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2015. J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, H. Lee, What is wrong with scene text recognition model comparisons? dataset and model analysis (2019) 4715–4723. Karaoglu, Tao, van Gemert, Gevers (b0010) 2017; 26 Liao, Zhang, Wan, Xie, Liang, Lyu, Yao, Bai (b0045) 2019 F. Sheng, Z. Chen, B. Xu, NRTR: A no-recurrence sequence-to-sequence model for scene text recognition, arXiv:1806.00926. Jaderberg, Simonyan, Vedaldi, Zisserman (b0105) 2015; 116 W. Liu, C. Chen, K.-Y. K. Wong, Char-Net: A character-aware neural network for distorted scene text recognition, in: Proc. AAAI Conf. Artificial Intell., 2018. C.-Y. Lee, S. Osindero, Recursive recurrent nets with attention modeling for ocr in the wild, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016. D. Gurari, Q. Li, A. J. Stangl, A. Guo, C. Lin, K. Grauman, J. Luo, J. P. Bigham, Vizwiz grand challenge: answering visual questions from blind people, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. Y.-C. Wu, F. Yin, X.-Y. Zhang, L. Liu, C.-L. Liu, SCAN: Sliding convolutional attention network for scene text recognition, arXiv:1806.00578. Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, S. Zhou, Focusing attention: Towards accurate text recognition in natural images, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5086–5094. W. Liu, C. Chen, K.-Y.K. Wong, Z. Su, J. Han, STAR-Net: A spatial attention residue network for scene text recognition, in: Proc. British Mach. Vis. Conf., 2016. M. Jaderberg, K. Simonyan, A. Zisserman, et al., Spatial transformer networks, in: Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 2017–2025. M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformers, in: Proc. Int. Conf. Learn. Representations, 2019. J. Wang, X. Hu, Gated recurrent convolution neural network for ocr, in: Proc. Adv. Neural Inf. Process. Syst., 2017. K. Wang, B. Babenko, S. Belongie, End-to-end scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2011, pp. 1457–1464. T. Wang, Y. Zhu, L. Jin, C. Luo, X. Chen, Y. Wu, Q. Wang, M. Cai, Decoupled attention network for text recognition. L. Gómez, A. Mafla, M. Rusinol, D. Karatzas, Single shot scene text retrieval, in: Proc. Eur. Conf. Comp. Vis., 2018. D. Elliott, S. Frank, K. Sima’an, L. Specia, Multi30k: Multilingual english-german image descriptions (2016) 70–74. D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L.G. i Bigorda, S.R. Mestre, J. Mas, D.F. Mota, J.A. Almazan, L.P. de las Heras, ICDAR 2013 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2013. Luo, Canjie (b0185) 2019 B. Shi, X. Wang, P. Lv, C. Yao, X. Bai, Robust scene text recognition with automatic rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016. X. Yang, D. He, Z. Zhou, D. Kifer, C.L. Giles, Learning to read irregular text with attention mechanisms, in: Proc. Int. Joint Conf. Artificial Intell., 2017. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Proc. Adv. Neural Inf. Process. Syst., 2017. M. Liao, P. Lyu, M. He, C. Yao, X. Bai, Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes, IEEE Trans. Pattern Anal. Mach. Intell. (99) (2019) 1–1. H. Li, P. Wang, C. Shen, Towards end-to-end text spotting with convolutional recurrent neural networks, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5238–5246. X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. ICDAR 2019 robust reading challenge on scene text visual question answering, http://rrc.cvc.uab.es/?ch=11, accessed: 2019-03-09. J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, Convolutional sequence to sequence learning, in: Proc. Int. Conf. Mach. Learn., 2017. M. Yang, Y. Guan, M. Liao, X. He, K. Bian, S. Bai, C. Yao, X. Bai, Symmetry-constrained rectification network for scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2019. Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, S. Zhou, AON: Towards arbitrarily-oriented text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. A. W. Yu, D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi, Q. V. Le, QANet: Combining local convolution with global self-attention for reading comprehension, in: Proc. Int. Conf. Learn. Representations, 2018. Tang, Wu (b0100) 2018; 20 L. Dong, S. Xu, B. Xu, Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition, in: Proc. IEEE Int. Conf. Acoustics, Speech & Signal Processing, 2018, pp. 5884–5888. Bai, Yang, Lyu, Xu, Luo (b0015) 2018; 6 Li, Wang, Shen, Zhang (b0050) 2019 T. Q. Phan, P. Shivakumara, S. Tian, C. L. Tan, Recognizing text with perspective distortion in natural scenes, in: Proc. IEEE Int. Conf. Comp. Vis., 2013, pp. 569–576. Risnumawan, Shivakumara, Chan, Tan (b0210) 2014; 41 Z. Wan, M. He, H. Chen, X. Bai, C. Yao, Textscanner: Reading characters in order for robust scene text recognition. F. Bai, Z. Cheng, Y. Niu, S. Pu, S. Zhou, Edit probability for scene text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. Luo (10.1016/j.neucom.2020.07.010_b0185) 2019 10.1016/j.neucom.2020.07.010_b0020 Liao (10.1016/j.neucom.2020.07.010_b0045) 2019 10.1016/j.neucom.2020.07.010_b0065 10.1016/j.neucom.2020.07.010_b0220 10.1016/j.neucom.2020.07.010_b0265 10.1016/j.neucom.2020.07.010_b0145 10.1016/j.neucom.2020.07.010_b0025 10.1016/j.neucom.2020.07.010_b0225 10.1016/j.neucom.2020.07.010_b0180 10.1016/j.neucom.2020.07.010_b0060 Tian (10.1016/j.neucom.2020.07.010_b0125) 2016; 51 10.1016/j.neucom.2020.07.010_b0260 10.1016/j.neucom.2020.07.010_b0140 Shi (10.1016/j.neucom.2020.07.010_b0035) 2018; 1–1 Karaoglu (10.1016/j.neucom.2020.07.010_b0010) 2017; 26 10.1016/j.neucom.2020.07.010_b0075 10.1016/j.neucom.2020.07.010_b0230 Jaderberg (10.1016/j.neucom.2020.07.010_b0105) 2015; 116 Su (10.1016/j.neucom.2020.07.010_b0135) 2017; 63 10.1016/j.neucom.2020.07.010_b0155 10.1016/j.neucom.2020.07.010_b0235 10.1016/j.neucom.2020.07.010_b0115 10.1016/j.neucom.2020.07.010_b0190 10.1016/j.neucom.2020.07.010_b0070 10.1016/j.neucom.2020.07.010_b0150 10.1016/j.neucom.2020.07.010_b0030 10.1016/j.neucom.2020.07.010_b0195 10.1016/j.neucom.2020.07.010_b0205 10.1016/j.neucom.2020.07.010_b0240 10.1016/j.neucom.2020.07.010_b0120 Shi (10.1016/j.neucom.2020.07.010_b0110) 2017; 39 10.1016/j.neucom.2020.07.010_b0165 10.1016/j.neucom.2020.07.010_b0200 10.1016/j.neucom.2020.07.010_b0245 10.1016/j.neucom.2020.07.010_b0005 10.1016/j.neucom.2020.07.010_b0080 10.1016/j.neucom.2020.07.010_b0160 10.1016/j.neucom.2020.07.010_b0040 10.1016/j.neucom.2020.07.010_b0085 10.1016/j.neucom.2020.07.010_b0215 Bai (10.1016/j.neucom.2020.07.010_b0015) 2018; 6 10.1016/j.neucom.2020.07.010_b0130 10.1016/j.neucom.2020.07.010_b0175 10.1016/j.neucom.2020.07.010_b0055 10.1016/j.neucom.2020.07.010_b0255 10.1016/j.neucom.2020.07.010_b0090 10.1016/j.neucom.2020.07.010_b0170 10.1016/j.neucom.2020.07.010_b0095 Tang (10.1016/j.neucom.2020.07.010_b0100) 2018; 20 10.1016/j.neucom.2020.07.010_b0250 Risnumawan (10.1016/j.neucom.2020.07.010_b0210) 2014; 41 Li (10.1016/j.neucom.2020.07.010_b0050) 2019
References_xml	– reference: W. Liu, C. Chen, K.-Y.K. Wong, Z. Su, J. Han, STAR-Net: A spatial attention residue network for scene text recognition, in: Proc. British Mach. Vis. Conf., 2016. – reference: B. Shi, X. Wang, P. Lv, C. Yao, X. Bai, Robust scene text recognition with automatic rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016. – reference: X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. – reference: F. Zhan, S. Lu, ESIR: End-to-end scene text recognition via iterative rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019. – reference: T. Wang, Y. Zhu, L. Jin, C. Luo, X. Chen, Y. Wu, Q. Wang, M. Cai, Decoupled attention network for text recognition. – reference: T. Q. Phan, P. Shivakumara, S. Tian, C. L. Tan, Recognizing text with perspective distortion in natural scenes, in: Proc. IEEE Int. Conf. Comp. Vis., 2013, pp. 569–576. – reference: B. Shi, C. Yao, M. Liao, M. Yang, P. Xu, L. Cui, S. Belongie, S. Lu, X. Bai, Icdar 2017 competition on reading chinese text in the wild (rctw-17) (2017) 1429–1437. – reference: A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016, pp. 2315–2324. – reference: C.-Y. Lee, S. Osindero, Recursive recurrent nets with attention modeling for ocr in the wild, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016. – year: 2019 ident: b0050 article-title: A Show attend and read: a simple and strong baseline for irregular text recognition publication-title: Proc. AAAI Conf. Artificial Intell. – reference: D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L.G. i Bigorda, S.R. Mestre, J. Mas, D.F. Mota, J.A. Almazan, L.P. de las Heras, ICDAR 2013 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2013. – reference: Z. Wan, M. He, H. Chen, X. Bai, C. Yao, Textscanner: Reading characters in order for robust scene text recognition. – reference: ICDAR 2019 robust reading challenge on scene text visual question answering, http://rrc.cvc.uab.es/?ch=11, accessed: 2019-03-09. – reference: F. Sheng, Z. Chen, B. Xu, NRTR: A no-recurrence sequence-to-sequence model for scene text recognition, arXiv:1806.00926. – volume: 1–1 year: 2018 ident: b0035 article-title: ASTER: an attentional scene text recognizer with flexible rectification publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 20 start-page: 2276 year: 2018 end-page: 2288 ident: b0100 article-title: Scene text detection using superpixel-based stroke feature transform and deep learning based region classification publication-title: IEEE Trans. Multimedia – reference: A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Proc. Adv. Neural Inf. Process. Syst., 2017. – reference: F. Bai, Z. Cheng, Y. Niu, S. Pu, S. Zhou, Edit probability for scene text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. – reference: Y. Gao, Y. Chen, J. Wang, H. Lu, Reading scene text with attention convolutional sequence modeling, Neurocomputing. 339 (2019) 161–170. – volume: 63 start-page: 397 year: 2017 end-page: 405 ident: b0135 article-title: Accurate recognition of words in scenes without character segmentation using recurrent neural network publication-title: Pattern Recogn. – reference: J. Wang, X. Hu, Gated recurrent convolution neural network for ocr, in: Proc. Adv. Neural Inf. Process. Syst., 2017. – reference: M. Yang, Y. Guan, M. Liao, X. He, K. Bian, S. Bai, C. Yao, X. Bai, Symmetry-constrained rectification network for scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2019. – reference: X. Yang, D. He, Z. Zhou, D. Kifer, C.L. Giles, Learning to read irregular text with attention mechanisms, in: Proc. Int. Joint Conf. Artificial Intell., 2017. – reference: H. Li, P. Wang, C. Shen, Towards end-to-end text spotting with convolutional recurrent neural networks, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5238–5246. – reference: D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, E. Valveny, ICDAR 2015 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2015. – reference: S. Karaoglu, R. Tao, T. Gevers, A.W. Smeulders, Words matter: Scene text for image classification and retrieval 19(5) (2017) 1063–1076. – reference: A. W. Yu, D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi, Q. V. Le, QANet: Combining local convolution with global self-attention for reading comprehension, in: Proc. Int. Conf. Learn. Representations, 2018. – reference: Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, S. Zhou, AON: Towards arbitrarily-oriented text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. – reference: M.D. Zeiler, ADADELTA: an adaptive learning rate method, arXiv preprint arXiv:1212.5701. – reference: Z. Liu, Y. Li, F. Ren, W. L. Goh, H. Yu, SqueezedText: a real-time scene text recognition by binary convolutional encoder-decoder network, in: Proc. AAAI Conf. Artificial Intell., 2018. – year: 2019 ident: b0045 article-title: Scene text recognition from two-dimensional perspective publication-title: Proc. AAAI Conf. Artificial Intell. – volume: 39 start-page: 2298 year: 2017 end-page: 2304 ident: b0110 article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – reference: A. Mishra, K. Alahari, C.V. Jawahar, Scene text recognition using higher order language priors, in: Proc. British Mach. Vis. Conf., 2012, pp. 1–11. – reference: J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, Convolutional sequence to sequence learning, in: Proc. Int. Conf. Mach. Learn., 2017. – volume: 26 start-page: 3965 year: 2017 end-page: 3980 ident: b0010 article-title: Con-text: Text detection for fine-grained object classification publication-title: IEEE Tans. Image Process. – reference: K. Wang, B. Babenko, S. Belongie, End-to-end scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2011, pp. 1457–1464. – reference: M. Jaderberg, K. Simonyan, A. Zisserman, et al., Spatial transformer networks, in: Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 2017–2025. – year: 2019 ident: b0185 article-title: MORAN: A multi-object rectified attention network for scene text recognition publication-title: Pattern Recogn. – volume: 51 start-page: 125 year: 2016 end-page: 134 ident: b0125 article-title: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients publication-title: Pattern Recogn. – reference: Y.-C. Wu, F. Yin, X.-Y. Zhang, L. Liu, C.-L. Liu, SCAN: Sliding convolutional attention network for scene text recognition, arXiv:1806.00578. – reference: L. Dong, S. Xu, B. Xu, Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition, in: Proc. IEEE Int. Conf. Acoustics, Speech & Signal Processing, 2018, pp. 5884–5888. – reference: D. Elliott, S. Frank, K. Sima’an, L. Specia, Multi30k: Multilingual english-german image descriptions (2016) 70–74. – reference: D. Gurari, Q. Li, A. J. Stangl, A. Guo, C. Lin, K. Grauman, J. Luo, J. P. Bigham, Vizwiz grand challenge: answering visual questions from blind people, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018. – volume: 116 start-page: 1 year: 2015 end-page: 20 ident: b0105 article-title: Reading text in the wild with convolutional neural networks publication-title: Int. J. Comp. Vis. – reference: W. Liu, C. Chen, K.-Y. K. Wong, Char-Net: A character-aware neural network for distorted scene text recognition, in: Proc. AAAI Conf. Artificial Intell., 2018. – reference: M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformers, in: Proc. Int. Conf. Learn. Representations, 2019. – reference: K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016. – reference: J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, H. Lee, What is wrong with scene text recognition model comparisons? dataset and model analysis (2019) 4715–4723. – volume: 6 start-page: 66322 year: 2018 end-page: 66335 ident: b0015 article-title: Integrating scene text and visual appearance for fine-grained image classification publication-title: IEEE Access – reference: Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, S. Zhou, Focusing attention: Towards accurate text recognition in natural images, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5086–5094. – volume: 41 start-page: 8027 year: 2014 end-page: 8048 ident: b0210 article-title: A robust arbitrary text detection system for natural scene images publication-title: Expert Syst. Appl. – reference: L. Gómez, A. Mafla, M. Rusinol, D. Karatzas, Single shot scene text retrieval, in: Proc. Eur. Conf. Comp. Vis., 2018. – reference: M. Liao, P. Lyu, M. He, C. Yao, X. Bai, Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes, IEEE Trans. Pattern Anal. Mach. Intell. (99) (2019) 1–1. – ident: 10.1016/j.neucom.2020.07.010_b0145 doi: 10.1109/CVPR.2019.00216 – ident: 10.1016/j.neucom.2020.07.010_b0165 – ident: 10.1016/j.neucom.2020.07.010_b0075 – ident: 10.1016/j.neucom.2020.07.010_b0150 doi: 10.1609/aaai.v32i1.12246 – volume: 41 start-page: 8027 issue: 18 year: 2014 ident: 10.1016/j.neucom.2020.07.010_b0210 article-title: A robust arbitrary text detection system for natural scene images publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2014.07.008 – ident: 10.1016/j.neucom.2020.07.010_b0245 – ident: 10.1016/j.neucom.2020.07.010_b0065 doi: 10.1109/CVPR.2016.452 – volume: 6 start-page: 66322 year: 2018 ident: 10.1016/j.neucom.2020.07.010_b0015 article-title: Integrating scene text and visual appearance for fine-grained image classification publication-title: IEEE Access doi: 10.1109/ACCESS.2018.2878899 – ident: 10.1016/j.neucom.2020.07.010_b0090 doi: 10.1109/ICCV.2011.6126402 – ident: 10.1016/j.neucom.2020.07.010_b0160 doi: 10.1109/ICASSP.2018.8462506 – ident: 10.1016/j.neucom.2020.07.010_b0155 – ident: 10.1016/j.neucom.2020.07.010_b0260 doi: 10.5244/C.30.43 – volume: 1–1 year: 2018 ident: 10.1016/j.neucom.2020.07.010_b0035 article-title: ASTER: an attentional scene text recognizer with flexible rectification publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – ident: 10.1016/j.neucom.2020.07.010_b0235 doi: 10.1609/aaai.v32i1.12252 – ident: 10.1016/j.neucom.2020.07.010_b0120 doi: 10.1109/CVPR.2018.00163 – ident: 10.1016/j.neucom.2020.07.010_b0025 doi: 10.1109/CVPR.2018.00380 – ident: 10.1016/j.neucom.2020.07.010_b0250 – volume: 20 start-page: 2276 issue: 9 year: 2018 ident: 10.1016/j.neucom.2020.07.010_b0100 article-title: Scene text detection using superpixel-based stroke feature transform and deep learning based region classification publication-title: IEEE Trans. Multimedia doi: 10.1109/TMM.2018.2802644 – ident: 10.1016/j.neucom.2020.07.010_b0030 – ident: 10.1016/j.neucom.2020.07.010_b0070 doi: 10.1016/j.neucom.2019.01.094 – volume: 116 start-page: 1 issue: 1 year: 2015 ident: 10.1016/j.neucom.2020.07.010_b0105 article-title: Reading text in the wild with convolutional neural networks publication-title: Int. J. Comp. Vis. doi: 10.1007/s11263-015-0823-z – ident: 10.1016/j.neucom.2020.07.010_b0130 doi: 10.1109/ICDAR.2017.233 – year: 2019 ident: 10.1016/j.neucom.2020.07.010_b0050 article-title: A Show attend and read: a simple and strong baseline for irregular text recognition publication-title: Proc. AAAI Conf. Artificial Intell. – ident: 10.1016/j.neucom.2020.07.010_b0175 – ident: 10.1016/j.neucom.2020.07.010_b0095 doi: 10.1109/ICCV.2013.76 – ident: 10.1016/j.neucom.2020.07.010_b0240 doi: 10.1109/ICCV.2019.00924 – ident: 10.1016/j.neucom.2020.07.010_b0195 doi: 10.5244/C.26.127 – ident: 10.1016/j.neucom.2020.07.010_b0190 doi: 10.1109/ICCV.2019.00481 – ident: 10.1016/j.neucom.2020.07.010_b0140 – ident: 10.1016/j.neucom.2020.07.010_b0080 doi: 10.1109/CVPR.2016.245 – volume: 63 start-page: 397 year: 2017 ident: 10.1016/j.neucom.2020.07.010_b0135 article-title: Accurate recognition of words in scenes without character segmentation using recurrent neural network publication-title: Pattern Recogn. doi: 10.1016/j.patcog.2016.10.016 – ident: 10.1016/j.neucom.2020.07.010_b0115 doi: 10.1109/ICCV.2017.560 – year: 2019 ident: 10.1016/j.neucom.2020.07.010_b0045 article-title: Scene text recognition from two-dimensional perspective publication-title: Proc. AAAI Conf. Artificial Intell. – ident: 10.1016/j.neucom.2020.07.010_b0205 – ident: 10.1016/j.neucom.2020.07.010_b0220 – volume: 39 start-page: 2298 issue: 11 year: 2017 ident: 10.1016/j.neucom.2020.07.010_b0110 article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2016.2646371 – volume: 51 start-page: 125 year: 2016 ident: 10.1016/j.neucom.2020.07.010_b0125 article-title: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients publication-title: Pattern Recogn. doi: 10.1016/j.patcog.2015.07.009 – ident: 10.1016/j.neucom.2020.07.010_b0200 doi: 10.1109/ICDAR.2013.221 – ident: 10.1016/j.neucom.2020.07.010_b0180 doi: 10.1109/CVPR.2016.90 – ident: 10.1016/j.neucom.2020.07.010_b0055 doi: 10.24963/ijcai.2017/458 – ident: 10.1016/j.neucom.2020.07.010_b0225 doi: 10.1109/CVPR.2018.00813 – ident: 10.1016/j.neucom.2020.07.010_b0005 – ident: 10.1016/j.neucom.2020.07.010_b0060 doi: 10.1109/ICCV.2017.543 – ident: 10.1016/j.neucom.2020.07.010_b0170 – ident: 10.1016/j.neucom.2020.07.010_b0215 doi: 10.1109/CVPR.2016.254 – ident: 10.1016/j.neucom.2020.07.010_b0265 – ident: 10.1016/j.neucom.2020.07.010_b0020 doi: 10.1007/978-3-030-01264-9_43 – volume: 26 start-page: 3965 issue: 8 year: 2017 ident: 10.1016/j.neucom.2020.07.010_b0010 article-title: Con-text: Text detection for fine-grained object classification publication-title: IEEE Tans. Image Process. doi: 10.1109/TIP.2017.2707805 – ident: 10.1016/j.neucom.2020.07.010_b0230 – ident: 10.1016/j.neucom.2020.07.010_b0040 doi: 10.1109/CVPR.2018.00584 – ident: 10.1016/j.neucom.2020.07.010_b0255 – year: 2019 ident: 10.1016/j.neucom.2020.07.010_b0185 article-title: MORAN: A multi-object rectified attention network for scene text recognition publication-title: Pattern Recogn. doi: 10.1016/j.patcog.2019.01.020 – ident: 10.1016/j.neucom.2020.07.010_b0085
SSID	ssj0017129
Score	2.5440438
Snippet	Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	67
SubjectTerms	Convolutional-Attention Holistic Representation Scene Text Recognition Transformer
Title	A holistic representation guided attention network for scene text recognition
URI	https://dx.doi.org/10.1016/j.neucom.2020.07.010
Volume	414
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA6lXrz4Fuuj5OA1dvPY17EUS1XaixZ6C91kIhXZltJe_e1msrtFQRQ87pJAGCZfvoFvviHktjDCRhADK0zmCxQjOcszmzCPfBw8vY1MGJ0wniSjqXqcxbMWGTS9MCirrLG_wvSA1vWfXh3N3mqx6D1HufBVFBfC5yXnKdpuK5Vilt997GQePOWi8tsTMcPVTftc0HiVsEXNiPCcKVh4Yh_tT8_TlydneEQOaq5I-9VxjkkLyhNy2MxhoPW1PCXjPvUYFhyXaTCpbBqKSvq6XViwFE00g6yRlpXsm3quStHICShKP-hOR7Qsz8h0eP8yGLF6TAIzMhUbJsCpDHiSG6Vi4VwsCiyCTOGRKzdRDIk1zkXOSVMIMAJ8iSCV8UwtExZMJs9Ju1yWcEEooBCmsCq3iiuIs7nwd9TZPAIn51yZDpFNdLSpPcRxlMW7bsRib7qKqcaY6ijVPqYdwna7VpWHxh_r0ybw-lsuaA_zv-68_PfOK7KPX9hlyOU1aW_WW7jxdGNTdEM-dcle_-FpNPkEpoXWtA
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEB5qe9CLb7E-c_C6dJNN9nEsRWnt42ILvYVuMpGKbIu0_99kN1sURMHr7g6EIfnyDfvNNwAPuWI6RIFBrlJboKiIBlmq48AiH0VLb0NVjk4YT-L-jD_PxbwBvboXxskqPfZXmF6itX_S8dnsrJfLzkuYMVtFUcbsvqQ0ifeg5dypRBNa3cGwP9n9TEgoqyz3mAhcQN1BV8q8Ctw62QiztKl08XSttD_dUF9unadjOPR0kXSrFZ1AA4tTOKpHMRB_Ms9g3CUWxkrTZVL6VNY9RQV53S41auJ8NEtlIykq5TexdJU4LyckTv1BdlKiVXEOs6fHaa8f-EkJgYoStgkYGp4ijTPFuWDGCJa7OkjlFrwyFQqMtTImNCZSOUPF0FYJEVeWrKVMo0qjC2gWqwIvgaDTwuSaZ5pTjiJdMHtMjc5CNNGCctWGqM6OVN5G3E2zeJe1XuxNVjmVLqcyTKTNaRuCXdS6stH44_ukTrz8th2kRfpfI6_-HXkP-_3peCRHg8nwGg7cG9d0SKMbaG4-tnhr2ccmv_O76xMKadll
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+holistic+representation+guided+attention+network+for+scene+text+recognition&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Yang%2C+Lu&rft.au=Wang%2C+Peng&rft.au=Li%2C+Hui&rft.au=Li%2C+Zhen&rft.date=2020-11-13&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=414&rft.spage=67&rft.epage=75&rft_id=info:doi/10.1016%2Fj.neucom.2020.07.010&rft.externalDocID=S0925231220311176
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon