Word-level BERT-CNN-RNN Model for Chinese Punctuation Restoration
Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences betw...
Saved in:
Published in | 2020 IEEE 6th International Conference on Computer and Communications (ICCC) pp. 1629 - 1633 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
11.12.2020
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICCC51575.2020.9344889 |
Cover
Abstract | Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences between Chinese and English in terms of grammar and basic semantic units, existing methods for English is not suitable for Chinese punctuation restoration. To tackle this problem, we propose a hybrid model combining the kernel of Bidirectional Encoder Representations from Transformers (BERT), Convolution Neural Network (CNN) and Recurrent Neural Network (RNN). This model employs a flexible structure and special CNN design which can extract word-level features for Chinese language. We compared the performance of the hybrid model with five widely-used punctuation restoration models on the public dataset. Experimental results demonstrate that our hybrid model is simple and efficient. It outperforms other models and achieves an accuracy of 69.1%. |
---|---|
AbstractList | Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at performing punctuation restoration for English, there have been only limited attempts for Chinese punctuation restoration. Due to the differences between Chinese and English in terms of grammar and basic semantic units, existing methods for English is not suitable for Chinese punctuation restoration. To tackle this problem, we propose a hybrid model combining the kernel of Bidirectional Encoder Representations from Transformers (BERT), Convolution Neural Network (CNN) and Recurrent Neural Network (RNN). This model employs a flexible structure and special CNN design which can extract word-level features for Chinese language. We compared the performance of the hybrid model with five widely-used punctuation restoration models on the public dataset. Experimental results demonstrate that our hybrid model is simple and efficient. It outperforms other models and achieves an accuracy of 69.1%. |
Author | Liu, Jie Chi, Lihua Zhang, Zhe Chen, Xinhai |
Author_xml | – sequence: 1 givenname: Zhe surname: Zhang fullname: Zhang, Zhe email: zhangzhe18a@nudt.edu.cn organization: Science and Technology on Parallel and Distributed Processing Laboratory & Laboratory of Software Engineering for Complex Systems, National University of Defense Technology,Changsha,China – sequence: 2 givenname: Jie surname: Liu fullname: Liu, Jie email: liujie@nudt.edu.cn organization: Science and Technology on Parallel and Distributed Processing Laboratory & Laboratory of Software Engineering for Complex Systems, National University of Defense Technology,Changsha,China – sequence: 3 givenname: Lihua surname: Chi fullname: Chi, Lihua email: chichch@126.com organization: College of Computer Science and Electronic Engineering, Hunan University,Changsha,China – sequence: 4 givenname: Xinhai surname: Chen fullname: Chen, Xinhai email: chenxinhai1995@aliyun.com organization: Science and Technology on Parallel and Distributed Processing Laboratory & Laboratory of Software Engineering for Complex Systems, National University of Defense Technology,Changsha,China |
BookMark | eNotj11LwzAYRiPohZv-AkHyB1Lz5qNNLmeYOpjdKBMvR7q8wUBtpO0E_71Dd3UO5-KBZ0Yu-9wjIffACwBuH1bOOQ260oXgghdWKmWMvSAzqIQBU0ptrsniPQ-BdfiNHX1cNjvm6po1dU1fczilmAfqPlKPI9LtsT9MRz-l3NMGxykPf35DrqLvRrw9c07enpY798LWm-eVW6xZAjATK73yCK32qJVGFCrGVkQPWEoIoqxA-GClhaB8OEkQB2vayEPgHGKprJyTu__dhIj7ryF9-uFnf34lfwHhNkbO |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICCC51575.2020.9344889 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1728186358 9781728186351 |
EndPage | 1633 |
ExternalDocumentID | 9344889 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Key Research and Development Program of China grantid: 2018YFB0204301,2017YFB0202104 funderid: 10.13039/501100012166 |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i118t-6a4ae1b5ae545ee24ffb2fa1e631d26712ad9391d4add93d2c98bf0dd001f6493 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:38:41 EDT 2023 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i118t-6a4ae1b5ae545ee24ffb2fa1e631d26712ad9391d4add93d2c98bf0dd001f6493 |
PageCount | 5 |
ParticipantIDs | ieee_primary_9344889 |
PublicationCentury | 2000 |
PublicationDate | 2020-Dec.-11 |
PublicationDateYYYYMMDD | 2020-12-11 |
PublicationDate_xml | – month: 12 year: 2020 text: 2020-Dec.-11 day: 11 |
PublicationDecade | 2020 |
PublicationTitle | 2020 IEEE 6th International Conference on Computer and Communications (ICCC) |
PublicationTitleAbbrev | ICCC |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.8175057 |
Snippet | Punctuation restoration in speech recognition has a wide range of application scenarios. Despite the widespread success of neural networks methods at... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1629 |
SubjectTerms | BERT Bit error rate CNN Computational modeling Feature extraction Language Model Punctuation Restoration Recurrent neural networks RNN Semantics Speech recognition Task analysis |
Title | Word-level BERT-CNN-RNN Model for Chinese Punctuation Restoration |
URI | https://ieeexplore.ieee.org/document/9344889 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA7bTp5UNvE3OXg0XdOkXXPUsjEFi4wNdxtJ8wLi2ETbi3-9L22dKB68hZLQhoR872u-9z5CrvBYNEILzQCcYZI7wYwZWaQqxqQxaB-ierVFnkwX8n4ZLzvkepcLAwC1-AwC36zv8u22qPyvsqESSCZS1SVd3GZNrlab9MtDNbzLsgzReRQj64vCoO38wzWlBo3JPnn4el2jFXkJqtIExcevSoz__Z4DMvhOz6OPO-A5JB3Y9MnNE9JItvYaIHo7ns1ZludslufUu52tKcam1HtlwzuORSwrmxLfdFY7y9TtAVlMxvNsylp_BPaMtKBkiZYauIk1YBgEEEnnTOQ0h0RwGyUjHmmrhOJW4iGmhI0KlRoXWovQ5BKpxBHpbbYbOCZUyESEuihiGzoZO_AsxiaG--LuIrXqhPT99FevTQmMVTvz078fn5E9vwRe9cH5OemVbxVcIHaX5rJetE-Zdpqu |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8MgGP0y50FPajbjb3vwKF0ptCtHbbZsujVm2eJuC5SPxLhsRruLf73QVo3GgzdCaICS8Hjwvu8BXNltUTHJJEE0inBqGFGqqy1VUSqJULojqlNbZPFgxu_m0bwB11-xMIhYis_Qd8XyLV-v8427KusIZslEIrZg2-I-j6porTrslwaiM0zT1OJzN7K8Lwz8uvkP35QSNvp7MP7ssFKLPPubQvn5-69cjP8d0T60vwP0vIcv6DmABq5acPNoiSRZOhWQd9ubTEmaZWSSZZ7zO1t69nTqObdsfLPfWjQrqiTf3qT0linLbZj1e9N0QGqHBPJkiUFBYsklUhVJtL8DMeTGqNBIijGjOoy7NJRaMEE1t9uYYDrMRaJMoLUFJxNzwQ6huVqv8Ag8xmMWyDyPdGB4ZNDxGB0r6tK7s0SLY2i56S9eqiQYi3rmJ39XX8LOYDoeLUbD7P4Udt1yOA0IpWfQLF43eG6RvFAX5QJ-AOYQnfs |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+6th+International+Conference+on+Computer+and+Communications+%28ICCC%29&rft.atitle=Word-level+BERT-CNN-RNN+Model+for+Chinese+Punctuation+Restoration&rft.au=Zhang%2C+Zhe&rft.au=Liu%2C+Jie&rft.au=Chi%2C+Lihua&rft.au=Chen%2C+Xinhai&rft.date=2020-12-11&rft.pub=IEEE&rft.spage=1629&rft.epage=1633&rft_id=info:doi/10.1109%2FICCC51575.2020.9344889&rft.externalDocID=9344889 |