A holistic representation guided attention network for scene text recognition

Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-trai...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 414; pp. 67 - 75
Main Authors Yang, Lu, Wang, Peng, Li, Hui, Li, Zhen, Zhang, Yanning
Format Journal Article
LanguageEnglish
Published Elsevier B.V 13.11.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.
AbstractList Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.
Author Wang, Peng
Li, Hui
Li, Zhen
Yang, Lu
Zhang, Yanning
Author_xml – sequence: 1
  givenname: Lu
  surname: Yang
  fullname: Yang, Lu
  email: lu.yang@mail.nwpu.edu.cn
  organization: School of Computer Science, Northwestern Polytechnical University, Xi’an, China
– sequence: 2
  givenname: Peng
  surname: Wang
  fullname: Wang, Peng
  email: peng.wang@nwpu.edu.cn
  organization: School of Computer Science, Northwestern Polytechnical University, Xi’an, China
– sequence: 3
  givenname: Hui
  surname: Li
  fullname: Li, Hui
  email: huili03855@gmail.com
  organization: School of Computer Science, The University of Adelaide, Australia
– sequence: 4
  givenname: Zhen
  surname: Li
  fullname: Li, Zhen
  email: lizhen@mskj.com
  organization: MinSheng FinTech Corp. Ltd., China
– sequence: 5
  givenname: Yanning
  surname: Zhang
  fullname: Zhang, Yanning
  email: ynzhang@nwpu.edu.cn
  organization: School of Computer Science, Northwestern Polytechnical University, Xi’an, China
BookMark eNqFkMtOwzAQRS1UJErhD1jkBxLsyZsFUlXxkorYwNpKx-Pi0tqV7fL4exKFFQtYjXQ152rmnLKJdZYYuxA8E1xUl5vM0gHdLgMOPON1xgU_YlPR1JA20FQTNuUtlCnkAk7YaQgbzkUtoJ2yx3ny6rYmRIOJp72nQDZ20TibrA9GkUq6GPtoCCzFD-ffEu18EpAsJZE-Y4-hW1szrJyxY91tA53_zBl7ub15Xtyny6e7h8V8mWJeQ0yBdNGQqFosihK0LmFVi6rAlVLUIi-pUqg11zrHFRAC1WWTF1jwsgFF2OQzdjX2oncheNISzXh19J3ZSsHlIEZu5ChGDmIkr2UvpoeLX_Dem13nv_7DrkeM-sfeDXkZ0JBFUqY3EKVy5u-Cb_g9hDA
CitedBy_id crossref_primary_10_1007_s13735_022_00253_6
crossref_primary_10_11834_jig_221049
crossref_primary_10_1038_s41598_022_14434_0
crossref_primary_10_1007_s40747_022_00916_1
crossref_primary_10_1109_TSMC_2023_3319964
crossref_primary_10_1109_TPAMI_2021_3132034
crossref_primary_10_1007_s10032_022_00398_4
crossref_primary_10_1007_s13369_021_06311_1
crossref_primary_10_1016_j_patcog_2021_107980
crossref_primary_10_1117_1_JEI_32_2_023015
crossref_primary_10_1142_S021800142353004X
crossref_primary_10_1155_2021_6658842
crossref_primary_10_1007_s10489_022_04241_5
crossref_primary_10_3390_s24092791
crossref_primary_10_1145_3625822
crossref_primary_10_1016_j_eswa_2023_122769
crossref_primary_10_1109_TCSVT_2022_3146240
crossref_primary_10_1109_TPAMI_2022_3230962
crossref_primary_10_1007_s10489_021_02219_3
crossref_primary_10_1016_j_knosys_2023_111178
crossref_primary_10_1109_ACCESS_2022_3207469
crossref_primary_10_1155_2022_2206917
crossref_primary_10_1016_j_asoc_2023_110969
crossref_primary_10_1007_s10489_021_03119_2
Cites_doi 10.1109/CVPR.2019.00216
10.1609/aaai.v32i1.12246
10.1016/j.eswa.2014.07.008
10.1109/CVPR.2016.452
10.1109/ACCESS.2018.2878899
10.1109/ICCV.2011.6126402
10.1109/ICASSP.2018.8462506
10.5244/C.30.43
10.1609/aaai.v32i1.12252
10.1109/CVPR.2018.00163
10.1109/CVPR.2018.00380
10.1109/TMM.2018.2802644
10.1016/j.neucom.2019.01.094
10.1007/s11263-015-0823-z
10.1109/ICDAR.2017.233
10.1109/ICCV.2013.76
10.1109/ICCV.2019.00924
10.5244/C.26.127
10.1109/ICCV.2019.00481
10.1109/CVPR.2016.245
10.1016/j.patcog.2016.10.016
10.1109/ICCV.2017.560
10.1109/TPAMI.2016.2646371
10.1016/j.patcog.2015.07.009
10.1109/ICDAR.2013.221
10.1109/CVPR.2016.90
10.24963/ijcai.2017/458
10.1109/CVPR.2018.00813
10.1109/ICCV.2017.543
10.1109/CVPR.2016.254
10.1007/978-3-030-01264-9_43
10.1109/TIP.2017.2707805
10.1109/CVPR.2018.00584
10.1016/j.patcog.2019.01.020
ContentType Journal Article
Copyright 2020 Elsevier B.V.
Copyright_xml – notice: 2020 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.neucom.2020.07.010
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-8286
EndPage 75
ExternalDocumentID 10_1016_j_neucom_2020_07_010
S0925231220311176
GroupedDBID ---
--K
--M
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXLA
AAXUO
AAYFN
ABBOA
ABCQJ
ABFNM
ABJNI
ABMAC
ABYKQ
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
KOM
LG9
M41
MO0
MOBAO
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSN
SSV
SSZ
T5K
ZMT
~G-
29N
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ABXDB
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
BNPGV
CITATION
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
R2-
RIG
SBC
SEW
SSH
WUQ
XPP
ID FETCH-LOGICAL-c372t-2ef48e169c4452ff52b7164cbdde9c05e6dcff0ff3cb2ec2e75834c40582dec83
IEDL.DBID .~1
ISSN 0925-2312
IngestDate Tue Jul 01 01:46:51 EDT 2025
Thu Apr 24 23:03:13 EDT 2025
Fri Feb 23 02:47:41 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Transformer
Holistic Representation
Scene Text Recognition
Convolutional-Attention
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c372t-2ef48e169c4452ff52b7164cbdde9c05e6dcff0ff3cb2ec2e75834c40582dec83
PageCount 9
ParticipantIDs crossref_citationtrail_10_1016_j_neucom_2020_07_010
crossref_primary_10_1016_j_neucom_2020_07_010
elsevier_sciencedirect_doi_10_1016_j_neucom_2020_07_010
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-11-13
PublicationDateYYYYMMDD 2020-11-13
PublicationDate_xml – month: 11
  year: 2020
  text: 2020-11-13
  day: 13
PublicationDecade 2020
PublicationTitle Neurocomputing (Amsterdam)
PublicationYear 2020
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Shi, Yang, Wang, Lyu, Yao, Bai (b0035) 2018; 1–1
F. Zhan, S. Lu, ESIR: End-to-end scene text recognition via iterative rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019.
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016.
Shi, Bai, Yao (b0110) 2017; 39
A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016, pp. 2315–2324.
Y. Gao, Y. Chen, J. Wang, H. Lu, Reading scene text with attention convolutional sequence modeling, Neurocomputing. 339 (2019) 161–170.
A. Mishra, K. Alahari, C.V. Jawahar, Scene text recognition using higher order language priors, in: Proc. British Mach. Vis. Conf., 2012, pp. 1–11.
S. Karaoglu, R. Tao, T. Gevers, A.W. Smeulders, Words matter: Scene text for image classification and retrieval 19(5) (2017) 1063–1076.
Tian, Bhattacharya, Lu, Su, Wang, Wei, Lu, Tan (b0125) 2016; 51
B. Shi, C. Yao, M. Liao, M. Yang, P. Xu, L. Cui, S. Belongie, S. Lu, X. Bai, Icdar 2017 competition on reading chinese text in the wild (rctw-17) (2017) 1429–1437.
Z. Liu, Y. Li, F. Ren, W. L. Goh, H. Yu, SqueezedText: a real-time scene text recognition by binary convolutional encoder-decoder network, in: Proc. AAAI Conf. Artificial Intell., 2018.
Su, Lu (b0135) 2017; 63
M.D. Zeiler, ADADELTA: an adaptive learning rate method, arXiv preprint arXiv:1212.5701.
D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, E. Valveny, ICDAR 2015 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2015.
J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, H. Lee, What is wrong with scene text recognition model comparisons? dataset and model analysis (2019) 4715–4723.
Karaoglu, Tao, van Gemert, Gevers (b0010) 2017; 26
Liao, Zhang, Wan, Xie, Liang, Lyu, Yao, Bai (b0045) 2019
F. Sheng, Z. Chen, B. Xu, NRTR: A no-recurrence sequence-to-sequence model for scene text recognition, arXiv:1806.00926.
Jaderberg, Simonyan, Vedaldi, Zisserman (b0105) 2015; 116
W. Liu, C. Chen, K.-Y. K. Wong, Char-Net: A character-aware neural network for distorted scene text recognition, in: Proc. AAAI Conf. Artificial Intell., 2018.
C.-Y. Lee, S. Osindero, Recursive recurrent nets with attention modeling for ocr in the wild, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016.
D. Gurari, Q. Li, A. J. Stangl, A. Guo, C. Lin, K. Grauman, J. Luo, J. P. Bigham, Vizwiz grand challenge: answering visual questions from blind people, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
Y.-C. Wu, F. Yin, X.-Y. Zhang, L. Liu, C.-L. Liu, SCAN: Sliding convolutional attention network for scene text recognition, arXiv:1806.00578.
Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, S. Zhou, Focusing attention: Towards accurate text recognition in natural images, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5086–5094.
W. Liu, C. Chen, K.-Y.K. Wong, Z. Su, J. Han, STAR-Net: A spatial attention residue network for scene text recognition, in: Proc. British Mach. Vis. Conf., 2016.
M. Jaderberg, K. Simonyan, A. Zisserman, et al., Spatial transformer networks, in: Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 2017–2025.
M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformers, in: Proc. Int. Conf. Learn. Representations, 2019.
J. Wang, X. Hu, Gated recurrent convolution neural network for ocr, in: Proc. Adv. Neural Inf. Process. Syst., 2017.
K. Wang, B. Babenko, S. Belongie, End-to-end scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2011, pp. 1457–1464.
T. Wang, Y. Zhu, L. Jin, C. Luo, X. Chen, Y. Wu, Q. Wang, M. Cai, Decoupled attention network for text recognition.
L. Gómez, A. Mafla, M. Rusinol, D. Karatzas, Single shot scene text retrieval, in: Proc. Eur. Conf. Comp. Vis., 2018.
D. Elliott, S. Frank, K. Sima’an, L. Specia, Multi30k: Multilingual english-german image descriptions (2016) 70–74.
D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L.G. i Bigorda, S.R. Mestre, J. Mas, D.F. Mota, J.A. Almazan, L.P. de las Heras, ICDAR 2013 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2013.
Luo, Canjie (b0185) 2019
B. Shi, X. Wang, P. Lv, C. Yao, X. Bai, Robust scene text recognition with automatic rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016.
X. Yang, D. He, Z. Zhou, D. Kifer, C.L. Giles, Learning to read irregular text with attention mechanisms, in: Proc. Int. Joint Conf. Artificial Intell., 2017.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Proc. Adv. Neural Inf. Process. Syst., 2017.
M. Liao, P. Lyu, M. He, C. Yao, X. Bai, Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes, IEEE Trans. Pattern Anal. Mach. Intell. (99) (2019) 1–1.
H. Li, P. Wang, C. Shen, Towards end-to-end text spotting with convolutional recurrent neural networks, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5238–5246.
X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
ICDAR 2019 robust reading challenge on scene text visual question answering, http://rrc.cvc.uab.es/?ch=11, accessed: 2019-03-09.
J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, Convolutional sequence to sequence learning, in: Proc. Int. Conf. Mach. Learn., 2017.
M. Yang, Y. Guan, M. Liao, X. He, K. Bian, S. Bai, C. Yao, X. Bai, Symmetry-constrained rectification network for scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2019.
Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, S. Zhou, AON: Towards arbitrarily-oriented text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
A. W. Yu, D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi, Q. V. Le, QANet: Combining local convolution with global self-attention for reading comprehension, in: Proc. Int. Conf. Learn. Representations, 2018.
Tang, Wu (b0100) 2018; 20
L. Dong, S. Xu, B. Xu, Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition, in: Proc. IEEE Int. Conf. Acoustics, Speech & Signal Processing, 2018, pp. 5884–5888.
Bai, Yang, Lyu, Xu, Luo (b0015) 2018; 6
Li, Wang, Shen, Zhang (b0050) 2019
T. Q. Phan, P. Shivakumara, S. Tian, C. L. Tan, Recognizing text with perspective distortion in natural scenes, in: Proc. IEEE Int. Conf. Comp. Vis., 2013, pp. 569–576.
Risnumawan, Shivakumara, Chan, Tan (b0210) 2014; 41
Z. Wan, M. He, H. Chen, X. Bai, C. Yao, Textscanner: Reading characters in order for robust scene text recognition.
F. Bai, Z. Cheng, Y. Niu, S. Pu, S. Zhou, Edit probability for scene text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
Luo (10.1016/j.neucom.2020.07.010_b0185) 2019
10.1016/j.neucom.2020.07.010_b0020
Liao (10.1016/j.neucom.2020.07.010_b0045) 2019
10.1016/j.neucom.2020.07.010_b0065
10.1016/j.neucom.2020.07.010_b0220
10.1016/j.neucom.2020.07.010_b0265
10.1016/j.neucom.2020.07.010_b0145
10.1016/j.neucom.2020.07.010_b0025
10.1016/j.neucom.2020.07.010_b0225
10.1016/j.neucom.2020.07.010_b0180
10.1016/j.neucom.2020.07.010_b0060
Tian (10.1016/j.neucom.2020.07.010_b0125) 2016; 51
10.1016/j.neucom.2020.07.010_b0260
10.1016/j.neucom.2020.07.010_b0140
Shi (10.1016/j.neucom.2020.07.010_b0035) 2018; 1–1
Karaoglu (10.1016/j.neucom.2020.07.010_b0010) 2017; 26
10.1016/j.neucom.2020.07.010_b0075
10.1016/j.neucom.2020.07.010_b0230
Jaderberg (10.1016/j.neucom.2020.07.010_b0105) 2015; 116
Su (10.1016/j.neucom.2020.07.010_b0135) 2017; 63
10.1016/j.neucom.2020.07.010_b0155
10.1016/j.neucom.2020.07.010_b0235
10.1016/j.neucom.2020.07.010_b0115
10.1016/j.neucom.2020.07.010_b0190
10.1016/j.neucom.2020.07.010_b0070
10.1016/j.neucom.2020.07.010_b0150
10.1016/j.neucom.2020.07.010_b0030
10.1016/j.neucom.2020.07.010_b0195
10.1016/j.neucom.2020.07.010_b0205
10.1016/j.neucom.2020.07.010_b0240
10.1016/j.neucom.2020.07.010_b0120
Shi (10.1016/j.neucom.2020.07.010_b0110) 2017; 39
10.1016/j.neucom.2020.07.010_b0165
10.1016/j.neucom.2020.07.010_b0200
10.1016/j.neucom.2020.07.010_b0245
10.1016/j.neucom.2020.07.010_b0005
10.1016/j.neucom.2020.07.010_b0080
10.1016/j.neucom.2020.07.010_b0160
10.1016/j.neucom.2020.07.010_b0040
10.1016/j.neucom.2020.07.010_b0085
10.1016/j.neucom.2020.07.010_b0215
Bai (10.1016/j.neucom.2020.07.010_b0015) 2018; 6
10.1016/j.neucom.2020.07.010_b0130
10.1016/j.neucom.2020.07.010_b0175
10.1016/j.neucom.2020.07.010_b0055
10.1016/j.neucom.2020.07.010_b0255
10.1016/j.neucom.2020.07.010_b0090
10.1016/j.neucom.2020.07.010_b0170
10.1016/j.neucom.2020.07.010_b0095
Tang (10.1016/j.neucom.2020.07.010_b0100) 2018; 20
10.1016/j.neucom.2020.07.010_b0250
Risnumawan (10.1016/j.neucom.2020.07.010_b0210) 2014; 41
Li (10.1016/j.neucom.2020.07.010_b0050) 2019
References_xml – reference: W. Liu, C. Chen, K.-Y.K. Wong, Z. Su, J. Han, STAR-Net: A spatial attention residue network for scene text recognition, in: Proc. British Mach. Vis. Conf., 2016.
– reference: B. Shi, X. Wang, P. Lv, C. Yao, X. Bai, Robust scene text recognition with automatic rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016.
– reference: X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
– reference: F. Zhan, S. Lu, ESIR: End-to-end scene text recognition via iterative rectification, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019.
– reference: T. Wang, Y. Zhu, L. Jin, C. Luo, X. Chen, Y. Wu, Q. Wang, M. Cai, Decoupled attention network for text recognition.
– reference: T. Q. Phan, P. Shivakumara, S. Tian, C. L. Tan, Recognizing text with perspective distortion in natural scenes, in: Proc. IEEE Int. Conf. Comp. Vis., 2013, pp. 569–576.
– reference: B. Shi, C. Yao, M. Liao, M. Yang, P. Xu, L. Cui, S. Belongie, S. Lu, X. Bai, Icdar 2017 competition on reading chinese text in the wild (rctw-17) (2017) 1429–1437.
– reference: A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016, pp. 2315–2324.
– reference: C.-Y. Lee, S. Osindero, Recursive recurrent nets with attention modeling for ocr in the wild, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016.
– year: 2019
  ident: b0050
  article-title: A Show attend and read: a simple and strong baseline for irregular text recognition
  publication-title: Proc. AAAI Conf. Artificial Intell.
– reference: D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L.G. i Bigorda, S.R. Mestre, J. Mas, D.F. Mota, J.A. Almazan, L.P. de las Heras, ICDAR 2013 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2013.
– reference: Z. Wan, M. He, H. Chen, X. Bai, C. Yao, Textscanner: Reading characters in order for robust scene text recognition.
– reference: ICDAR 2019 robust reading challenge on scene text visual question answering, http://rrc.cvc.uab.es/?ch=11, accessed: 2019-03-09.
– reference: F. Sheng, Z. Chen, B. Xu, NRTR: A no-recurrence sequence-to-sequence model for scene text recognition, arXiv:1806.00926.
– volume: 1–1
  year: 2018
  ident: b0035
  article-title: ASTER: an attentional scene text recognizer with flexible rectification
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 20
  start-page: 2276
  year: 2018
  end-page: 2288
  ident: b0100
  article-title: Scene text detection using superpixel-based stroke feature transform and deep learning based region classification
  publication-title: IEEE Trans. Multimedia
– reference: A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Proc. Adv. Neural Inf. Process. Syst., 2017.
– reference: F. Bai, Z. Cheng, Y. Niu, S. Pu, S. Zhou, Edit probability for scene text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
– reference: Y. Gao, Y. Chen, J. Wang, H. Lu, Reading scene text with attention convolutional sequence modeling, Neurocomputing. 339 (2019) 161–170.
– volume: 63
  start-page: 397
  year: 2017
  end-page: 405
  ident: b0135
  article-title: Accurate recognition of words in scenes without character segmentation using recurrent neural network
  publication-title: Pattern Recogn.
– reference: J. Wang, X. Hu, Gated recurrent convolution neural network for ocr, in: Proc. Adv. Neural Inf. Process. Syst., 2017.
– reference: M. Yang, Y. Guan, M. Liao, X. He, K. Bian, S. Bai, C. Yao, X. Bai, Symmetry-constrained rectification network for scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2019.
– reference: X. Yang, D. He, Z. Zhou, D. Kifer, C.L. Giles, Learning to read irregular text with attention mechanisms, in: Proc. Int. Joint Conf. Artificial Intell., 2017.
– reference: H. Li, P. Wang, C. Shen, Towards end-to-end text spotting with convolutional recurrent neural networks, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5238–5246.
– reference: D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, E. Valveny, ICDAR 2015 robust reading competition, in: Proc. Int. Conf. Doc. Anal. Recog., 2015.
– reference: S. Karaoglu, R. Tao, T. Gevers, A.W. Smeulders, Words matter: Scene text for image classification and retrieval 19(5) (2017) 1063–1076.
– reference: A. W. Yu, D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi, Q. V. Le, QANet: Combining local convolution with global self-attention for reading comprehension, in: Proc. Int. Conf. Learn. Representations, 2018.
– reference: Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, S. Zhou, AON: Towards arbitrarily-oriented text recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
– reference: M.D. Zeiler, ADADELTA: an adaptive learning rate method, arXiv preprint arXiv:1212.5701.
– reference: Z. Liu, Y. Li, F. Ren, W. L. Goh, H. Yu, SqueezedText: a real-time scene text recognition by binary convolutional encoder-decoder network, in: Proc. AAAI Conf. Artificial Intell., 2018.
– year: 2019
  ident: b0045
  article-title: Scene text recognition from two-dimensional perspective
  publication-title: Proc. AAAI Conf. Artificial Intell.
– volume: 39
  start-page: 2298
  year: 2017
  end-page: 2304
  ident: b0110
  article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: A. Mishra, K. Alahari, C.V. Jawahar, Scene text recognition using higher order language priors, in: Proc. British Mach. Vis. Conf., 2012, pp. 1–11.
– reference: J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, Convolutional sequence to sequence learning, in: Proc. Int. Conf. Mach. Learn., 2017.
– volume: 26
  start-page: 3965
  year: 2017
  end-page: 3980
  ident: b0010
  article-title: Con-text: Text detection for fine-grained object classification
  publication-title: IEEE Tans. Image Process.
– reference: K. Wang, B. Babenko, S. Belongie, End-to-end scene text recognition, in: Proc. IEEE Int. Conf. Comp. Vis., 2011, pp. 1457–1464.
– reference: M. Jaderberg, K. Simonyan, A. Zisserman, et al., Spatial transformer networks, in: Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 2017–2025.
– year: 2019
  ident: b0185
  article-title: MORAN: A multi-object rectified attention network for scene text recognition
  publication-title: Pattern Recogn.
– volume: 51
  start-page: 125
  year: 2016
  end-page: 134
  ident: b0125
  article-title: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients
  publication-title: Pattern Recogn.
– reference: Y.-C. Wu, F. Yin, X.-Y. Zhang, L. Liu, C.-L. Liu, SCAN: Sliding convolutional attention network for scene text recognition, arXiv:1806.00578.
– reference: L. Dong, S. Xu, B. Xu, Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition, in: Proc. IEEE Int. Conf. Acoustics, Speech & Signal Processing, 2018, pp. 5884–5888.
– reference: D. Elliott, S. Frank, K. Sima’an, L. Specia, Multi30k: Multilingual english-german image descriptions (2016) 70–74.
– reference: D. Gurari, Q. Li, A. J. Stangl, A. Guo, C. Lin, K. Grauman, J. Luo, J. P. Bigham, Vizwiz grand challenge: answering visual questions from blind people, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
– volume: 116
  start-page: 1
  year: 2015
  end-page: 20
  ident: b0105
  article-title: Reading text in the wild with convolutional neural networks
  publication-title: Int. J. Comp. Vis.
– reference: W. Liu, C. Chen, K.-Y. K. Wong, Char-Net: A character-aware neural network for distorted scene text recognition, in: Proc. AAAI Conf. Artificial Intell., 2018.
– reference: M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformers, in: Proc. Int. Conf. Learn. Representations, 2019.
– reference: K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016.
– reference: J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, H. Lee, What is wrong with scene text recognition model comparisons? dataset and model analysis (2019) 4715–4723.
– volume: 6
  start-page: 66322
  year: 2018
  end-page: 66335
  ident: b0015
  article-title: Integrating scene text and visual appearance for fine-grained image classification
  publication-title: IEEE Access
– reference: Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, S. Zhou, Focusing attention: Towards accurate text recognition in natural images, in: Proc. IEEE Int. Conf. Comp. Vis., 2017, pp. 5086–5094.
– volume: 41
  start-page: 8027
  year: 2014
  end-page: 8048
  ident: b0210
  article-title: A robust arbitrary text detection system for natural scene images
  publication-title: Expert Syst. Appl.
– reference: L. Gómez, A. Mafla, M. Rusinol, D. Karatzas, Single shot scene text retrieval, in: Proc. Eur. Conf. Comp. Vis., 2018.
– reference: M. Liao, P. Lyu, M. He, C. Yao, X. Bai, Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes, IEEE Trans. Pattern Anal. Mach. Intell. (99) (2019) 1–1.
– ident: 10.1016/j.neucom.2020.07.010_b0145
  doi: 10.1109/CVPR.2019.00216
– ident: 10.1016/j.neucom.2020.07.010_b0165
– ident: 10.1016/j.neucom.2020.07.010_b0075
– ident: 10.1016/j.neucom.2020.07.010_b0150
  doi: 10.1609/aaai.v32i1.12246
– volume: 41
  start-page: 8027
  issue: 18
  year: 2014
  ident: 10.1016/j.neucom.2020.07.010_b0210
  article-title: A robust arbitrary text detection system for natural scene images
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2014.07.008
– ident: 10.1016/j.neucom.2020.07.010_b0245
– ident: 10.1016/j.neucom.2020.07.010_b0065
  doi: 10.1109/CVPR.2016.452
– volume: 6
  start-page: 66322
  year: 2018
  ident: 10.1016/j.neucom.2020.07.010_b0015
  article-title: Integrating scene text and visual appearance for fine-grained image classification
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2878899
– ident: 10.1016/j.neucom.2020.07.010_b0090
  doi: 10.1109/ICCV.2011.6126402
– ident: 10.1016/j.neucom.2020.07.010_b0160
  doi: 10.1109/ICASSP.2018.8462506
– ident: 10.1016/j.neucom.2020.07.010_b0155
– ident: 10.1016/j.neucom.2020.07.010_b0260
  doi: 10.5244/C.30.43
– volume: 1–1
  year: 2018
  ident: 10.1016/j.neucom.2020.07.010_b0035
  article-title: ASTER: an attentional scene text recognizer with flexible rectification
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– ident: 10.1016/j.neucom.2020.07.010_b0235
  doi: 10.1609/aaai.v32i1.12252
– ident: 10.1016/j.neucom.2020.07.010_b0120
  doi: 10.1109/CVPR.2018.00163
– ident: 10.1016/j.neucom.2020.07.010_b0025
  doi: 10.1109/CVPR.2018.00380
– ident: 10.1016/j.neucom.2020.07.010_b0250
– volume: 20
  start-page: 2276
  issue: 9
  year: 2018
  ident: 10.1016/j.neucom.2020.07.010_b0100
  article-title: Scene text detection using superpixel-based stroke feature transform and deep learning based region classification
  publication-title: IEEE Trans. Multimedia
  doi: 10.1109/TMM.2018.2802644
– ident: 10.1016/j.neucom.2020.07.010_b0030
– ident: 10.1016/j.neucom.2020.07.010_b0070
  doi: 10.1016/j.neucom.2019.01.094
– volume: 116
  start-page: 1
  issue: 1
  year: 2015
  ident: 10.1016/j.neucom.2020.07.010_b0105
  article-title: Reading text in the wild with convolutional neural networks
  publication-title: Int. J. Comp. Vis.
  doi: 10.1007/s11263-015-0823-z
– ident: 10.1016/j.neucom.2020.07.010_b0130
  doi: 10.1109/ICDAR.2017.233
– year: 2019
  ident: 10.1016/j.neucom.2020.07.010_b0050
  article-title: A Show attend and read: a simple and strong baseline for irregular text recognition
  publication-title: Proc. AAAI Conf. Artificial Intell.
– ident: 10.1016/j.neucom.2020.07.010_b0175
– ident: 10.1016/j.neucom.2020.07.010_b0095
  doi: 10.1109/ICCV.2013.76
– ident: 10.1016/j.neucom.2020.07.010_b0240
  doi: 10.1109/ICCV.2019.00924
– ident: 10.1016/j.neucom.2020.07.010_b0195
  doi: 10.5244/C.26.127
– ident: 10.1016/j.neucom.2020.07.010_b0190
  doi: 10.1109/ICCV.2019.00481
– ident: 10.1016/j.neucom.2020.07.010_b0140
– ident: 10.1016/j.neucom.2020.07.010_b0080
  doi: 10.1109/CVPR.2016.245
– volume: 63
  start-page: 397
  year: 2017
  ident: 10.1016/j.neucom.2020.07.010_b0135
  article-title: Accurate recognition of words in scenes without character segmentation using recurrent neural network
  publication-title: Pattern Recogn.
  doi: 10.1016/j.patcog.2016.10.016
– ident: 10.1016/j.neucom.2020.07.010_b0115
  doi: 10.1109/ICCV.2017.560
– year: 2019
  ident: 10.1016/j.neucom.2020.07.010_b0045
  article-title: Scene text recognition from two-dimensional perspective
  publication-title: Proc. AAAI Conf. Artificial Intell.
– ident: 10.1016/j.neucom.2020.07.010_b0205
– ident: 10.1016/j.neucom.2020.07.010_b0220
– volume: 39
  start-page: 2298
  issue: 11
  year: 2017
  ident: 10.1016/j.neucom.2020.07.010_b0110
  article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2016.2646371
– volume: 51
  start-page: 125
  year: 2016
  ident: 10.1016/j.neucom.2020.07.010_b0125
  article-title: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients
  publication-title: Pattern Recogn.
  doi: 10.1016/j.patcog.2015.07.009
– ident: 10.1016/j.neucom.2020.07.010_b0200
  doi: 10.1109/ICDAR.2013.221
– ident: 10.1016/j.neucom.2020.07.010_b0180
  doi: 10.1109/CVPR.2016.90
– ident: 10.1016/j.neucom.2020.07.010_b0055
  doi: 10.24963/ijcai.2017/458
– ident: 10.1016/j.neucom.2020.07.010_b0225
  doi: 10.1109/CVPR.2018.00813
– ident: 10.1016/j.neucom.2020.07.010_b0005
– ident: 10.1016/j.neucom.2020.07.010_b0060
  doi: 10.1109/ICCV.2017.543
– ident: 10.1016/j.neucom.2020.07.010_b0170
– ident: 10.1016/j.neucom.2020.07.010_b0215
  doi: 10.1109/CVPR.2016.254
– ident: 10.1016/j.neucom.2020.07.010_b0265
– ident: 10.1016/j.neucom.2020.07.010_b0020
  doi: 10.1007/978-3-030-01264-9_43
– volume: 26
  start-page: 3965
  issue: 8
  year: 2017
  ident: 10.1016/j.neucom.2020.07.010_b0010
  article-title: Con-text: Text detection for fine-grained object classification
  publication-title: IEEE Tans. Image Process.
  doi: 10.1109/TIP.2017.2707805
– ident: 10.1016/j.neucom.2020.07.010_b0230
– ident: 10.1016/j.neucom.2020.07.010_b0040
  doi: 10.1109/CVPR.2018.00584
– ident: 10.1016/j.neucom.2020.07.010_b0255
– year: 2019
  ident: 10.1016/j.neucom.2020.07.010_b0185
  article-title: MORAN: A multi-object rectified attention network for scene text recognition
  publication-title: Pattern Recogn.
  doi: 10.1016/j.patcog.2019.01.020
– ident: 10.1016/j.neucom.2020.07.010_b0085
SSID ssj0017129
Score 2.5440438
Snippet Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 67
SubjectTerms Convolutional-Attention
Holistic Representation
Scene Text Recognition
Transformer
Title A holistic representation guided attention network for scene text recognition
URI https://dx.doi.org/10.1016/j.neucom.2020.07.010
Volume 414
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA6lXrz4Fuuj5OA1dvPY17EUS1XaixZ6C91kIhXZltJe_e1msrtFQRQ87pJAGCZfvoFvviHktjDCRhADK0zmCxQjOcszmzCPfBw8vY1MGJ0wniSjqXqcxbMWGTS9MCirrLG_wvSA1vWfXh3N3mqx6D1HufBVFBfC5yXnKdpuK5Vilt997GQePOWi8tsTMcPVTftc0HiVsEXNiPCcKVh4Yh_tT8_TlydneEQOaq5I-9VxjkkLyhNy2MxhoPW1PCXjPvUYFhyXaTCpbBqKSvq6XViwFE00g6yRlpXsm3quStHICShKP-hOR7Qsz8h0eP8yGLF6TAIzMhUbJsCpDHiSG6Vi4VwsCiyCTOGRKzdRDIk1zkXOSVMIMAJ8iSCV8UwtExZMJs9Ju1yWcEEooBCmsCq3iiuIs7nwd9TZPAIn51yZDpFNdLSpPcRxlMW7bsRib7qKqcaY6ijVPqYdwna7VpWHxh_r0ybw-lsuaA_zv-68_PfOK7KPX9hlyOU1aW_WW7jxdGNTdEM-dcle_-FpNPkEpoXWtA
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEB5qe9CLb7E-c_C6dJNN9nEsRWnt42ILvYVuMpGKbIu0_99kN1sURMHr7g6EIfnyDfvNNwAPuWI6RIFBrlJboKiIBlmq48AiH0VLb0NVjk4YT-L-jD_PxbwBvboXxskqPfZXmF6itX_S8dnsrJfLzkuYMVtFUcbsvqQ0ifeg5dypRBNa3cGwP9n9TEgoqyz3mAhcQN1BV8q8Ctw62QiztKl08XSttD_dUF9unadjOPR0kXSrFZ1AA4tTOKpHMRB_Ms9g3CUWxkrTZVL6VNY9RQV53S41auJ8NEtlIykq5TexdJU4LyckTv1BdlKiVXEOs6fHaa8f-EkJgYoStgkYGp4ijTPFuWDGCJa7OkjlFrwyFQqMtTImNCZSOUPF0FYJEVeWrKVMo0qjC2gWqwIvgaDTwuSaZ5pTjiJdMHtMjc5CNNGCctWGqM6OVN5G3E2zeJe1XuxNVjmVLqcyTKTNaRuCXdS6stH44_ukTrz8th2kRfpfI6_-HXkP-_3peCRHg8nwGg7cG9d0SKMbaG4-tnhr2ccmv_O76xMKadll
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+holistic+representation+guided+attention+network+for+scene+text+recognition&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Yang%2C+Lu&rft.au=Wang%2C+Peng&rft.au=Li%2C+Hui&rft.au=Li%2C+Zhen&rft.date=2020-11-13&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=414&rft.spage=67&rft.epage=75&rft_id=info:doi/10.1016%2Fj.neucom.2020.07.010&rft.externalDocID=S0925231220311176
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon