低资源条件下基于I-vector特征的LSTM递归神经网络语音识别系统
在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(I-vector);最后将上述研究结合构建了基于I-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。...
Saved in:
Published in | 计算机应用研究 Vol. 34; no. 2; pp. 392 - 396 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190
2017
中国科学院大学,北京100190 |
Subjects | |
Online Access | Get full text |
ISSN | 1001-3695 |
DOI | 10.3969/j.issn.1001-3695.2017.02.016 |
Cover
Abstract | 在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(I-vector);最后将上述研究结合构建了基于I-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。 |
---|---|
AbstractList | 在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(I-vector);最后将上述研究结合构建了基于I-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。 TP391.42; 在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想.针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(i-vector);最后将上述研究结合构建了基于i-vector特征的LSTM递归神经网络系统.在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低. |
Abstract_FL | Under the condition of low resource,little labeled training data is available and the performance of speech recognition system is not ideal.To solve this problem.First,this paper investigated long short term memory recurrent neural network (LSTM RNN) for acoustic modeling.It was a powerful tool to model long time series and could make full use of the context information.Linear projection layer reduced the number of model parameters.Then,it explored speaker modeling methods in the feature space,and extracted identity vector (i-vector) which contained the speaker and channel information simultaneously.Finally,it presented a novel system,which combined the LSTM RNN model and i-vector feature.Results on the standard Open KWS 2013 data set show that this technology produces a relative improvement of about 10% in TER over the DNN baseline system. |
Author | 黄光许 田垚 康健 刘加 夏善红 |
AuthorAffiliation | 中国科学院大学,北京100190 中国科学院电子学研究所传感技术国家重点实验室,北京100190 清华大学电子工程系清华信息科学与技术国家实验室,北京100084 |
AuthorAffiliation_xml | – name: 中国科学院大学,北京100190;中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190 |
Author_FL | Kang Jian Huang Guangxu Tian Yao Liu Jia Xia Shanhong |
Author_FL_xml | – sequence: 1 fullname: Huang Guangxu – sequence: 2 fullname: Tian Yao – sequence: 3 fullname: Kang Jian – sequence: 4 fullname: Liu Jia – sequence: 5 fullname: Xia Shanhong |
Author_xml | – sequence: 1 fullname: 黄光许 田垚 康健 刘加 夏善红 |
BookMark | eNo9j0tLw0AcxPdQwbb6JcSDl8T_ZptNcpTio1ARsfeS19YG3Wjig9wUiihC1R5FLR48ifRQFG2onybptt_CSMXLDAw_ZpgCynGfuwgtYpCJQY1lT26GIZcxAJYINVRZAazJoMiAaQ7l__NZVAhDD6CkYAPyaDsZtsfvrdHgdvT4nMQfyed12h0kg3ZFOnHtIz8QV1_p97m4b1V3apuTs0467IiXJxHfiOGdiB_GvbdJtz_uXaSXr6Ifi7g7h2aYuRe6839eRLW11Vp5Q6purVfKK1XJpkAlzSJgUZW6jJYw1hnVmelojsMc0C2jZDsZpRGbqdQiOjEdbFqKxlyVarajY1BIES1Na09NzkzeqHv-ccCzwboXelEUeb__IROaoQtT1N71eeOwmcEHQXPfDKI61TDoVFF18gOCjnp7 |
ClassificationCodes | TP391.42 |
ContentType | Journal Article |
Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
DBID | 2RA 92L CQIGP W92 ~WA 2B. 4A8 92I 93N PSX TCJ |
DOI | 10.3969/j.issn.1001-3695.2017.02.016 |
DatabaseName | 维普期刊资源整合服务平台 中文科技期刊数据库-CALIS站点 中文科技期刊数据库-7.0平台 中文科技期刊数据库-工程技术 中文科技期刊数据库- 镜像站点 Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
DocumentTitleAlternate | Long short term memory recurrent neural network acoustic models using i-vector for low resource speech recognition |
DocumentTitle_FL | Long short term memory recurrent neural network acoustic models using i-vector for low resource speech recognition |
EndPage | 396 |
ExternalDocumentID | jsjyyyj201702016 671086258 |
GrantInformation_xml | – fundername: 国家自然科学基金资助项目 funderid: (61273268,61370034,61403224) |
GroupedDBID | -0Y 2B. 2C0 2RA 5XA 5XJ 92H 92I 92L ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CQIGP CUBFJ CW9 TCJ TGT U1G U5S W92 ~WA 4A8 93N ABJNI PSX |
ID | FETCH-LOGICAL-c606-7b30b656ef64118f68fad7ddfd08b94cdc6073cf56b383ad1ab27fe567cd81023 |
ISSN | 1001-3695 |
IngestDate | Thu May 29 03:54:51 EDT 2025 Wed Feb 14 10:06:25 EST 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 2 |
Keywords | 长短时记忆神经网络 i-vector speech recognition 语音识别 身份认证矢量 long short term memory(LSTM) |
Language | Chinese |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c606-7b30b656ef64118f68fad7ddfd08b94cdc6073cf56b383ad1ab27fe567cd81023 |
Notes | 51-1196/TP speech recognition; long short term memory(LSTM) ; i-vector Under the condition of low resource, little labeled training data is available and the performance of speech recogni- tion system is not ideal. To solve this problem. First, this paper investigated long short term memory recurrent neural network ( LSTM RNN) for acoustic modeling. It was a powerful tool to model long time series and could make full use of the context in- formation. Linear projection layer reduced the number of model parameters. Then, it explored speaker modeling methods in the feature space, and extracted identity vector (i-vector) which contained the speaker and channel information simultaneously. Finally, it presented a novel system, which combined the LSTM RNN model and i-vector feature. Results on the standard Open KWS 2013 data set show that this technology produces a relative improvement of about 10% in TER over the DNN base-line system. Huang Guangxu, Tian Yao, Kang Jian , Liu Jia , Xia Shanhong( 1. University of Chines |
PageCount | 5 |
ParticipantIDs | wanfang_journals_jsjyyyj201702016 chongqing_primary_671086258 |
PublicationCentury | 2000 |
PublicationDate | 2017 |
PublicationDateYYYYMMDD | 2017-01-01 |
PublicationDate_xml | – year: 2017 text: 2017 |
PublicationDecade | 2010 |
PublicationTitle | 计算机应用研究 |
PublicationTitleAlternate | Application Research of Computers |
PublicationTitle_FL | Application Research of Computers |
PublicationYear | 2017 |
Publisher | 中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190 中国科学院大学,北京100190 |
Publisher_xml | – name: 中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190 – name: 中国科学院大学,北京100190 |
SSID | ssj0042190 ssib001102940 ssib002263599 ssib023646305 ssib051375744 ssib025702191 |
Score | 2.060846 |
Snippet | ... TP391.42;... |
SourceID | wanfang chongqing |
SourceType | Aggregation Database Publisher |
StartPage | 392 |
SubjectTerms | 语音识别 身份认证矢量 长短时记忆神经网络 |
Title | 低资源条件下基于I-vector特征的LSTM递归神经网络语音识别系统 |
URI | http://lib.cqvip.com/qk/93231X/201702/671086258.html https://d.wanfangdata.com.cn/periodical/jsjyyyj201702016 |
Volume | 34 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR3LahRBsIkJiBffjxiVCOlTmLjz6sexJztLFBXEFXJb5pmYw0ZNIiQnhSCKEDVHUYMHTyI5BEWzxK_ZzSR_YVVPZ7MGCeplKGqqu6q6Zrqrmu4qQkbAxeAyznIrynMIUJhvW7EvhZVIlqa55Bkrs33eZhP3vBuT_mTfkfM9p5YW5uOxZOmP90r-x6qAA7viLdl_sGy3U0AADPaFJ1gYnn9lYxp6NKhSEdJQ0MCnwqMho4GisoKArFJla5qABkwDgoqAhj6VNSRDjILm163H5eZ9yKmQNJBIEoTgZSJGAol38279Fg2lPhjh6NdVDXCqoLcQAeAiahqAV7bByHGUTdWoqmJz5OsajGDYjwA40MQu0ptWv5021PSh1oUjILnWblyr4Gt9PS2oR5XQNBWDUaBMd-sR-aOMnmbro6Zlx4EYNc2Dih4cYKFGTdcB1-Q26Dlq5JVCAwr4aJTytOLQkOneObZTTu-GSnlz1Mz-eL7MZWXVz73lwey13u-J0su53i2L-Bm3wS0L8x5ckQAt9YqEDMa6DPBMIdfJYu0DicC1a8E4FsByfHGEDDic234_GVBBNajte7rgGPZmPnQwqdB-ZIllAVjPVI61CmFt6k7lvu1yXxc-KJ0WD16WiTuMgEfJiJH-2mGyY0aS6dnm1EPws_S1t2YeNad6PLT6SXLchFbDqvxPTpG-penT5MRe2ZJhs4qdIXfaWys7X5e3N19vv__Ybn1rf3_ZWdtsb67s_QbFix-dn0-Lt8v40e8-We1srRafPhStV8XWm6L1bmf9y-7axs76s87zz8VGq2itnSX1Wlgfn7BMZRErgYDd4rFbiSGQyXLmQYCdM5FHKYe5Ka2IWHpJClTcTXKfxa5wo9SOYofnmc94kgrMdXKO9Ddnm9kFMpx5QObkEDSJDHxhGUdZJHgUc4gUROomg2SoO0CNB2UCmUbXvIPkqhmyhplW5hozczOLi4szOMgQytns4qE9DJFjSFluCl4i_fOPFrLL4CbPx1fMJ_MLYt2YHw |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E4%BD%8E%E8%B5%84%E6%BA%90%E6%9D%A1%E4%BB%B6%E4%B8%8B%E5%9F%BA%E4%BA%8EI-vector%E7%89%B9%E5%BE%81%E7%9A%84LSTM%E9%80%92%E5%BD%92%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB%E7%B3%BB%E7%BB%9F&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%BA%94%E7%94%A8%E7%A0%94%E7%A9%B6&rft.au=%E9%BB%84%E5%85%89%E8%AE%B8+%E7%94%B0%E5%9E%9A+%E5%BA%B7%E5%81%A5+%E5%88%98%E5%8A%A0+%E5%A4%8F%E5%96%84%E7%BA%A2&rft.date=2017&rft.issn=1001-3695&rft.volume=34&rft.issue=2&rft.spage=392&rft.epage=396&rft_id=info:doi/10.3969%2Fj.issn.1001-3695.2017.02.016&rft.externalDocID=671086258 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F93231X%2F93231X.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjyyyj%2Fjsjyyyj.jpg |