Word-level BERT-CNN-RNN Model for Chinese Punctuation Restoration

Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences betw...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE 6th International Conference on Computer and Communications (ICCC) pp. 1629 - 1633
Main Authors Zhang, Zhe, Liu, Jie, Chi, Lihua, Chen, Xinhai
Format Conference Proceeding
LanguageEnglish
Published IEEE 11.12.2020
Subjects
Online AccessGet full text
DOI10.1109/ICCC51575.2020.9344889

Cover

Abstract Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences between Chinese and English in terms of grammar and basic semantic units, existing methods for English is not suitable for Chinese punctuation restoration. To tackle this problem, we propose a hybrid model combining the kernel of Bidirectional Encoder Representations from Transformers (BERT), Convolution Neural Network (CNN) and Recurrent Neural Network (RNN). This model employs a flexible structure and special CNN design which can extract word-level features for Chinese language. We compared the performance of the hybrid model with five widely-used punctuation restoration models on the public dataset. Experimental results demonstrate that our hybrid model is simple and efficient. It outperforms other models and achieves an accuracy of 69.1%.
AbstractList Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences between Chinese and English in terms of grammar and basic semantic units, existing methods for English is not suitable for Chinese punctuation restoration. To tackle this problem, we propose a hybrid model combining the kernel of Bidirectional Encoder Representations from Transformers (BERT), Convolution Neural Network (CNN) and Recurrent Neural Network (RNN). This model employs a flexible structure and special CNN design which can extract word-level features for Chinese language. We compared the performance of the hybrid model with five widely-used punctuation restoration models on the public dataset. Experimental results demonstrate that our hybrid model is simple and efficient. It outperforms other models and achieves an accuracy of 69.1%.
Author Liu, Jie
Chi, Lihua
Zhang, Zhe
Chen, Xinhai
Author_xml – sequence: 1
  givenname: Zhe
  surname: Zhang
  fullname: Zhang, Zhe
  email: zhangzhe18a@nudt.edu.cn
  organization: Science and Technology on Parallel and Distributed Processing Laboratory & Laboratory of Software Engineering for Complex Systems, National University of Defense Technology,Changsha,China
– sequence: 2
  givenname: Jie
  surname: Liu
  fullname: Liu, Jie
  email: liujie@nudt.edu.cn
  organization: Science and Technology on Parallel and Distributed Processing Laboratory & Laboratory of Software Engineering for Complex Systems, National University of Defense Technology,Changsha,China
– sequence: 3
  givenname: Lihua
  surname: Chi
  fullname: Chi, Lihua
  email: chichch@126.com
  organization: College of Computer Science and Electronic Engineering, Hunan University,Changsha,China
– sequence: 4
  givenname: Xinhai
  surname: Chen
  fullname: Chen, Xinhai
  email: chenxinhai1995@aliyun.com
  organization: Science and Technology on Parallel and Distributed Processing Laboratory & Laboratory of Software Engineering for Complex Systems, National University of Defense Technology,Changsha,China
BookMark eNotj11LwzAYRiPohZv-AkHyB1Lz5qNNLmeYOpjdKBMvR7q8wUBtpO0E_71Dd3UO5-KBZ0Yu-9wjIffACwBuH1bOOQ260oXgghdWKmWMvSAzqIQBU0ptrsniPQ-BdfiNHX1cNjvm6po1dU1fczilmAfqPlKPI9LtsT9MRz-l3NMGxykPf35DrqLvRrw9c07enpY798LWm-eVW6xZAjATK73yCK32qJVGFCrGVkQPWEoIoqxA-GClhaB8OEkQB2vayEPgHGKprJyTu__dhIj7ryF9-uFnf34lfwHhNkbO
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICCC51575.2020.9344889
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728186358
9781728186351
EndPage 1633
ExternalDocumentID 9344889
Genre orig-research
GrantInformation_xml – fundername: National Key Research and Development Program of China
  grantid: 2018YFB0204301,2017YFB0202104
  funderid: 10.13039/501100012166
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-6a4ae1b5ae545ee24ffb2fa1e631d26712ad9391d4add93d2c98bf0dd001f6493
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:41 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-6a4ae1b5ae545ee24ffb2fa1e631d26712ad9391d4add93d2c98bf0dd001f6493
PageCount 5
ParticipantIDs ieee_primary_9344889
PublicationCentury 2000
PublicationDate 2020-Dec.-11
PublicationDateYYYYMMDD 2020-12-11
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-Dec.-11
  day: 11
PublicationDecade 2020
PublicationTitle 2020 IEEE 6th International Conference on Computer and Communications (ICCC)
PublicationTitleAbbrev ICCC
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8175057
Snippet Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at...
SourceID ieee
SourceType Publisher
StartPage 1629
SubjectTerms BERT
Bit error rate
CNN
Computational modeling
Feature extraction
Language Model
Punctuation Restoration
Recurrent neural networks
RNN
Semantics
Speech recognition
Task analysis
Title Word-level BERT-CNN-RNN Model for Chinese Punctuation Restoration
URI https://ieeexplore.ieee.org/document/9344889
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA7bTp5UNvE3OXg0XdOkXXPUsjEFi4wNdxtJ8wLi2ETbi3-9L22dKB68hZLQhoR872u-9z5CrvBYNEILzQCcYZI7wYwZWaQqxqQxaB-ierVFnkwX8n4ZLzvkepcLAwC1-AwC36zv8u22qPyvsqESSCZS1SVd3GZNrlab9MtDNbzLsgzReRQj64vCoO38wzWlBo3JPnn4el2jFXkJqtIExcevSoz__Z4DMvhOz6OPO-A5JB3Y9MnNE9JItvYaIHo7ns1ZludslufUu52tKcam1HtlwzuORSwrmxLfdFY7y9TtAVlMxvNsylp_BPaMtKBkiZYauIk1YBgEEEnnTOQ0h0RwGyUjHmmrhOJW4iGmhI0KlRoXWovQ5BKpxBHpbbYbOCZUyESEuihiGzoZO_AsxiaG--LuIrXqhPT99FevTQmMVTvz078fn5E9vwRe9cH5OemVbxVcIHaX5rJetE-Zdpqu
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8MgGP0y50FPajbjb3vwKF0ptCtHbbZsujVm2eJuC5SPxLhsRruLf73QVo3GgzdCaICS8Hjwvu8BXNltUTHJJEE0inBqGFGqqy1VUSqJULojqlNbZPFgxu_m0bwB11-xMIhYis_Qd8XyLV-v8427KusIZslEIrZg2-I-j6porTrslwaiM0zT1OJzN7K8Lwz8uvkP35QSNvp7MP7ssFKLPPubQvn5-69cjP8d0T60vwP0vIcv6DmABq5acPNoiSRZOhWQd9ubTEmaZWSSZZ7zO1t69nTqObdsfLPfWjQrqiTf3qT0linLbZj1e9N0QGqHBPJkiUFBYsklUhVJtL8DMeTGqNBIijGjOoy7NJRaMEE1t9uYYDrMRaJMoLUFJxNzwQ6huVqv8Ag8xmMWyDyPdGB4ZNDxGB0r6tK7s0SLY2i56S9eqiQYi3rmJ39XX8LOYDoeLUbD7P4Udt1yOA0IpWfQLF43eG6RvFAX5QJ-AOYQnfs
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+6th+International+Conference+on+Computer+and+Communications+%28ICCC%29&rft.atitle=Word-level+BERT-CNN-RNN+Model+for+Chinese+Punctuation+Restoration&rft.au=Zhang%2C+Zhe&rft.au=Liu%2C+Jie&rft.au=Chi%2C+Lihua&rft.au=Chen%2C+Xinhai&rft.date=2020-12-11&rft.pub=IEEE&rft.spage=1629&rft.epage=1633&rft_id=info:doi/10.1109%2FICCC51575.2020.9344889&rft.externalDocID=9344889