BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild

As a prerequisite of many text-related tasks such as text erasing and text style transfer, text segmentation arouses more and more attention recently. Current researches mainly focus on only English characters and digits, while few work studies Chinese characters due to the lack of pub-lic large-sca...

Full description

Saved in:

Bibliographic Details
Published in	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 19130 - 19140
Main Authors	Xu, Xixi, Qi, Zhongang, Ma, Jianqi, Zhang, Honglun, Shan, Ying, Qie, Xiaohu
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2022
Subjects	Annotations Benchmark testing Character recognition Computer vision Datasets and evaluation; Computer vision theory Image segmentation Protocols Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As a prerequisite of many text-related tasks such as text erasing and text style transfer, text segmentation arouses more and more attention recently. Current researches mainly focus on only English characters and digits, while few work studies Chinese characters due to the lack of pub-lic large-scale and high-quality Chinese datasets, which limits the practical application scenarios of text segmentation. Different from English which has a limited alphabet of letters, Chinese has much more basic characters with com-plex structures, making the problem more difficult to deal with. To better analyze this problem, we propose the Bi-lingual Text Segmentation (BTS) dataset, a benchmark that covers various common Chinese scenes including 14,250 diverse and fine-annotated text images. BTS mainly focuses on Chinese characters, and also contains English words and digits. We also introduce Prior Guided Text Segmen-tation Network (PGTSNet), the first baseline to handle bi-lingual and complex-structured text segmentation. A plug-in text region highlighting module and a text perceptual dis-criminator are proposed in PGTSNet to supervise the model with text prior, and guide for more stable and finer text seg-mentation. A variation loss is also employed for suppressing background noise under complex scene. Extensive ex-periments are conducted not only to demonstrate the neces-sity and superiority of the proposed dataset BTS, but also to show the effectiveness of the proposed PGTSNet compared with a variety of state-of-the-art text segmentation methods.
ISSN:	2575-7075
DOI:	10.1109/CVPR52688.2022.01856