Word-level BERT-CNN-RNN Model for Chinese Punctuation Restoration

Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences betw...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE 6th International Conference on Computer and Communications (ICCC) pp. 1629 - 1633
Main Authors	Zhang, Zhe, Liu, Jie, Chi, Lihua, Chen, Xinhai
Format	Conference Proceeding
Language	English
Published	IEEE 11.12.2020
Subjects	BERT Bit error rate CNN Computational modeling Feature extraction Language Model Punctuation Restoration Recurrent neural networks RNN Semantics Speech recognition Task analysis
Online Access	Get full text
DOI	10.1109/ICCC51575.2020.9344889

Cover

More Information
Summary:	Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences between Chinese and English in terms of grammar and basic semantic units, existing methods for English is not suitable for Chinese punctuation restoration. To tackle this problem, we propose a hybrid model combining the kernel of Bidirectional Encoder Representations from Transformers (BERT), Convolution Neural Network (CNN) and Recurrent Neural Network (RNN). This model employs a flexible structure and special CNN design which can extract word-level features for Chinese language. We compared the performance of the hybrid model with five widely-used punctuation restoration models on the public dataset. Experimental results demonstrate that our hybrid model is simple and efficient. It outperforms other models and achieves an accuracy of 69.1%.
DOI:	10.1109/ICCC51575.2020.9344889