MTRNet++: One-stage mask-based scene text eraser

A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine...

Full description

Saved in:
Bibliographic Details
Published inComputer vision and image understanding Vol. 201; p. 103066
Main Authors Tursun, Osman, Denman, Simon, Zeng, Rui, Sivapalan, Sabesan, Sridharan, Sridha, Fookes, Clinton
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.12.2020
Subjects
Online AccessGet full text
ISSN1077-3142
1090-235X
DOI10.1016/j.cviu.2020.103066

Cover

Loading…
More Information
Summary:A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability. •The proposed MTRNet++ has a novel one-stage mask-based architecture.•MTRNet++ achieves state-of-the-art results on the Oxford and SCUT datasets.•MTRNet++ is end-to-end trainable. It converges on a large-scale dataset within an epoch.•MTRNet++ demonstrates controllability and interpretability.•We introduced some incremental modifications regarding training losses and strategy.
ISSN:1077-3142
1090-235X
DOI:10.1016/j.cviu.2020.103066