Word-level BERT-CNN-RNN Model for Chinese Punctuation Restoration
Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences betw...
Saved in:
Published in | 2020 IEEE 6th International Conference on Computer and Communications (ICCC) pp. 1629 - 1633 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
11.12.2020
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICCC51575.2020.9344889 |
Cover
Summary: | Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences between Chinese and English in terms of grammar and basic semantic units, existing methods for English is not suitable for Chinese punctuation restoration. To tackle this problem, we propose a hybrid model combining the kernel of Bidirectional Encoder Representations from Transformers (BERT), Convolution Neural Network (CNN) and Recurrent Neural Network (RNN). This model employs a flexible structure and special CNN design which can extract word-level features for Chinese language. We compared the performance of the hybrid model with five widely-used punctuation restoration models on the public dataset. Experimental results demonstrate that our hybrid model is simple and efficient. It outperforms other models and achieves an accuracy of 69.1%. |
---|---|
DOI: | 10.1109/ICCC51575.2020.9344889 |