低资源条件下基于I-vector特征的LSTM递归神经网络语音识别系统

在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(I-vector);最后将上述研究结合构建了基于I-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 34; no. 2; pp. 392 - 396
Main Author 黄光许 田垚 康健 刘加 夏善红
Format Journal Article
LanguageChinese
Published 中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190 2017
中国科学院大学,北京100190
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2017.02.016

Cover

Abstract 在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(I-vector);最后将上述研究结合构建了基于I-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。
AbstractList 在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(I-vector);最后将上述研究结合构建了基于I-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。
TP391.42; 在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想.针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(i-vector);最后将上述研究结合构建了基于i-vector特征的LSTM递归神经网络系统.在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低.
Abstract_FL Under the condition of low resource,little labeled training data is available and the performance of speech recognition system is not ideal.To solve this problem.First,this paper investigated long short term memory recurrent neural network (LSTM RNN) for acoustic modeling.It was a powerful tool to model long time series and could make full use of the context information.Linear projection layer reduced the number of model parameters.Then,it explored speaker modeling methods in the feature space,and extracted identity vector (i-vector) which contained the speaker and channel information simultaneously.Finally,it presented a novel system,which combined the LSTM RNN model and i-vector feature.Results on the standard Open KWS 2013 data set show that this technology produces a relative improvement of about 10% in TER over the DNN baseline system.
Author 黄光许 田垚 康健 刘加 夏善红
AuthorAffiliation 中国科学院大学,北京100190 中国科学院电子学研究所传感技术国家重点实验室,北京100190 清华大学电子工程系清华信息科学与技术国家实验室,北京100084
AuthorAffiliation_xml – name: 中国科学院大学,北京100190;中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190
Author_FL Kang Jian
Huang Guangxu
Tian Yao
Liu Jia
Xia Shanhong
Author_FL_xml – sequence: 1
  fullname: Huang Guangxu
– sequence: 2
  fullname: Tian Yao
– sequence: 3
  fullname: Kang Jian
– sequence: 4
  fullname: Liu Jia
– sequence: 5
  fullname: Xia Shanhong
Author_xml – sequence: 1
  fullname: 黄光许 田垚 康健 刘加 夏善红
BookMark eNo9j0tLw0AcxPdQwbb6JcSDl8T_ZptNcpTio1ARsfeS19YG3Wjig9wUiihC1R5FLR48ifRQFG2onybptt_CSMXLDAw_ZpgCynGfuwgtYpCJQY1lT26GIZcxAJYINVRZAazJoMiAaQ7l__NZVAhDD6CkYAPyaDsZtsfvrdHgdvT4nMQfyed12h0kg3ZFOnHtIz8QV1_p97m4b1V3apuTs0467IiXJxHfiOGdiB_GvbdJtz_uXaSXr6Ifi7g7h2aYuRe6839eRLW11Vp5Q6purVfKK1XJpkAlzSJgUZW6jJYw1hnVmelojsMc0C2jZDsZpRGbqdQiOjEdbFqKxlyVarajY1BIES1Na09NzkzeqHv-ccCzwboXelEUeb__IROaoQtT1N71eeOwmcEHQXPfDKI61TDoVFF18gOCjnp7
ClassificationCodes TP391.42
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
W92
~WA
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3969/j.issn.1001-3695.2017.02.016
DatabaseName 维普期刊资源整合服务平台
中文科技期刊数据库-CALIS站点
中文科技期刊数据库-7.0平台
中文科技期刊数据库-工程技术
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
DocumentTitleAlternate Long short term memory recurrent neural network acoustic models using i-vector for low resource speech recognition
DocumentTitle_FL Long short term memory recurrent neural network acoustic models using i-vector for low resource speech recognition
EndPage 396
ExternalDocumentID jsjyyyj201702016
671086258
GrantInformation_xml – fundername: 国家自然科学基金资助项目
  funderid: (61273268,61370034,61403224)
GroupedDBID -0Y
2B.
2C0
2RA
5XA
5XJ
92H
92I
92L
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CQIGP
CUBFJ
CW9
TCJ
TGT
U1G
U5S
W92
~WA
4A8
93N
ABJNI
PSX
ID FETCH-LOGICAL-c606-7b30b656ef64118f68fad7ddfd08b94cdc6073cf56b383ad1ab27fe567cd81023
ISSN 1001-3695
IngestDate Thu May 29 03:54:51 EDT 2025
Wed Feb 14 10:06:25 EST 2024
IsPeerReviewed false
IsScholarly true
Issue 2
Keywords 长短时记忆神经网络
i-vector
speech recognition
语音识别
身份认证矢量
long short term memory(LSTM)
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c606-7b30b656ef64118f68fad7ddfd08b94cdc6073cf56b383ad1ab27fe567cd81023
Notes 51-1196/TP
speech recognition; long short term memory(LSTM) ; i-vector
Under the condition of low resource, little labeled training data is available and the performance of speech recogni- tion system is not ideal. To solve this problem. First, this paper investigated long short term memory recurrent neural network ( LSTM RNN) for acoustic modeling. It was a powerful tool to model long time series and could make full use of the context in- formation. Linear projection layer reduced the number of model parameters. Then, it explored speaker modeling methods in the feature space, and extracted identity vector (i-vector) which contained the speaker and channel information simultaneously. Finally, it presented a novel system, which combined the LSTM RNN model and i-vector feature. Results on the standard Open KWS 2013 data set show that this technology produces a relative improvement of about 10% in TER over the DNN base-line system.
Huang Guangxu, Tian Yao, Kang Jian , Liu Jia , Xia Shanhong( 1. University of Chines
PageCount 5
ParticipantIDs wanfang_journals_jsjyyyj201702016
chongqing_primary_671086258
PublicationCentury 2000
PublicationDate 2017
PublicationDateYYYYMMDD 2017-01-01
PublicationDate_xml – year: 2017
  text: 2017
PublicationDecade 2010
PublicationTitle 计算机应用研究
PublicationTitleAlternate Application Research of Computers
PublicationTitle_FL Application Research of Computers
PublicationYear 2017
Publisher 中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190
中国科学院大学,北京100190
Publisher_xml – name: 中国科学院电子学研究所传感技术国家重点实验室,北京100190%清华大学电子工程系清华信息科学与技术国家实验室,北京,100084%中国科学院电子学研究所传感技术国家重点实验室,北京,100190
– name: 中国科学院大学,北京100190
SSID ssj0042190
ssib001102940
ssib002263599
ssib023646305
ssib051375744
ssib025702191
Score 2.060846
Snippet ...
TP391.42;...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 392
SubjectTerms 语音识别
身份认证矢量
长短时记忆神经网络
Title 低资源条件下基于I-vector特征的LSTM递归神经网络语音识别系统
URI http://lib.cqvip.com/qk/93231X/201702/671086258.html
https://d.wanfangdata.com.cn/periodical/jsjyyyj201702016
Volume 34
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR3LahRBsIkJiBffjxiVCOlTmLjz6sexJztLFBXEFXJb5pmYw0ZNIiQnhSCKEDVHUYMHTyI5BEWzxK_ZzSR_YVVPZ7MGCeplKGqqu6q6Zrqrmu4qQkbAxeAyznIrynMIUJhvW7EvhZVIlqa55Bkrs33eZhP3vBuT_mTfkfM9p5YW5uOxZOmP90r-x6qAA7viLdl_sGy3U0AADPaFJ1gYnn9lYxp6NKhSEdJQ0MCnwqMho4GisoKArFJla5qABkwDgoqAhj6VNSRDjILm163H5eZ9yKmQNJBIEoTgZSJGAol38279Fg2lPhjh6NdVDXCqoLcQAeAiahqAV7bByHGUTdWoqmJz5OsajGDYjwA40MQu0ptWv5021PSh1oUjILnWblyr4Gt9PS2oR5XQNBWDUaBMd-sR-aOMnmbro6Zlx4EYNc2Dih4cYKFGTdcB1-Q26Dlq5JVCAwr4aJTytOLQkOneObZTTu-GSnlz1Mz-eL7MZWXVz73lwey13u-J0su53i2L-Bm3wS0L8x5ckQAt9YqEDMa6DPBMIdfJYu0DicC1a8E4FsByfHGEDDic234_GVBBNajte7rgGPZmPnQwqdB-ZIllAVjPVI61CmFt6k7lvu1yXxc-KJ0WD16WiTuMgEfJiJH-2mGyY0aS6dnm1EPws_S1t2YeNad6PLT6SXLchFbDqvxPTpG-penT5MRe2ZJhs4qdIXfaWys7X5e3N19vv__Ybn1rf3_ZWdtsb67s_QbFix-dn0-Lt8v40e8-We1srRafPhStV8XWm6L1bmf9y-7axs76s87zz8VGq2itnSX1Wlgfn7BMZRErgYDd4rFbiSGQyXLmQYCdM5FHKYe5Ka2IWHpJClTcTXKfxa5wo9SOYofnmc94kgrMdXKO9Ddnm9kFMpx5QObkEDSJDHxhGUdZJHgUc4gUROomg2SoO0CNB2UCmUbXvIPkqhmyhplW5hozczOLi4szOMgQytns4qE9DJFjSFluCl4i_fOPFrLL4CbPx1fMJ_MLYt2YHw
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E4%BD%8E%E8%B5%84%E6%BA%90%E6%9D%A1%E4%BB%B6%E4%B8%8B%E5%9F%BA%E4%BA%8EI-vector%E7%89%B9%E5%BE%81%E7%9A%84LSTM%E9%80%92%E5%BD%92%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB%E7%B3%BB%E7%BB%9F&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%BA%94%E7%94%A8%E7%A0%94%E7%A9%B6&rft.au=%E9%BB%84%E5%85%89%E8%AE%B8+%E7%94%B0%E5%9E%9A+%E5%BA%B7%E5%81%A5+%E5%88%98%E5%8A%A0+%E5%A4%8F%E5%96%84%E7%BA%A2&rft.date=2017&rft.issn=1001-3695&rft.volume=34&rft.issue=2&rft.spage=392&rft.epage=396&rft_id=info:doi/10.3969%2Fj.issn.1001-3695.2017.02.016&rft.externalDocID=671086258
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F93231X%2F93231X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjyyyj%2Fjsjyyyj.jpg