Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder

Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio sourc...

Full description

Saved in:
Bibliographic Details
Published inFrontiers of information technology & electronic engineering Vol. 21; no. 11; pp. 1639 - 1650
Main Authors Chen, Jing-jing, Mao, Qi-rong, Qin, You-cai, Qian, Shuang-qing, Zheng, Zhi-shen
Format Journal Article
LanguageEnglish
Published Hangzhou Zhejiang University Press 01.11.2020
Springer Nature B.V
School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China%School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang 212013, China
Subjects
Online AccessGet full text
ISSN2095-9184
2095-9230
DOI10.1631/FITEE.2000019

Cover

Abstract Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor autoencoder (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific generative factors and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.
AbstractList Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor autoencoder (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific generative factors and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.
TN912.3; Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor autoencoder (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific generative factors and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.
Author Mao, Qi-rong
Chen, Jing-jing
Qian, Shuang-qing
Qin, You-cai
Zheng, Zhi-shen
AuthorAffiliation School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China%School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China;Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang 212013, China
AuthorAffiliation_xml – name: School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China%School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China;Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang 212013, China
Author_xml – sequence: 1
  givenname: Jing-jing
  orcidid: 0000-0003-2968-0313
  surname: Chen
  fullname: Chen, Jing-jing
  organization: School of Computer Science and Communication Engineering, Jiangsu University
– sequence: 2
  givenname: Qi-rong
  orcidid: 0000-0002-0616-4431
  surname: Mao
  fullname: Mao, Qi-rong
  email: mao_qr@ujs.edu.cn
  organization: School of Computer Science and Communication Engineering, Jiangsu University, Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace
– sequence: 3
  givenname: You-cai
  surname: Qin
  fullname: Qin, You-cai
  organization: School of Computer Science and Communication Engineering, Jiangsu University
– sequence: 4
  givenname: Shuang-qing
  surname: Qian
  fullname: Qian, Shuang-qing
  organization: School of Computer Science and Communication Engineering, Jiangsu University
– sequence: 5
  givenname: Zhi-shen
  surname: Zheng
  fullname: Zheng, Zhi-shen
  organization: School of Computer Science and Communication Engineering, Jiangsu University
BookMark eNpt0c9PwjAUB_DGYCIiR-9LvJkM-4uxHg0BJSHxgufl0b2OEWix3QT96y2C4WIvbZPPey_t95Z0rLNIyD2jA5YJ9jSdLSaTAadxMXVFupyqYaq4oJ2_M8vlDemHsD6SjKmRyrvEz6FB2yTBtV5jGnaoa1PrpEKLHpr6ExMDunE-2SB4W9sqMfGydRZaD5skFqBeJQF3cOTOJm04oj3W1arBMj1XQ9s4tNqV6O_ItYFNwP5575H36WQxfk3nby-z8fM81VwNmxTlSMpSKK4yytmyRIralAjSyIybUgoBuVJC5VLRJRfDJYocUGc5sAwUoOiRx1PfPVgDtirW8Y02Tiy-1-XhsNQFcsopY5TmET-c8M67jxZDc9E8_tyISxaH9Uh6Utq7EDyaYufrLfivgtHiGEPxG0NxjiH6wcmH6GyF_tL1_4IfWrSM6A
Cites_doi 10.1016/S0893-6080(00)00026-5
10.1109/TASLP.2014.2352935
10.1109/9780470043387
10.1109/TASLP.2017.2726762
10.1007/s11704-016-6107-0
10.1007/s11771-019-4211-7
10.1109/TASL.2006.876726
10.1006/csla.1994.1016
10.7551/mitpress/1486.001.0001
10.1023/A:1007425814087
10.1109/TASL.2012.2215591
10.1109/TSA.2005.858005
10.6028/NIST.IR.4930
10.1109/TASLP.2019.2915167
10.1109/29.35387
10.1109/ACCESS.2018.2884027
10.1016/j.sigpro.2007.02.003
10.1631/FITEE.1700814
10.1109/ICASSP.2017.7951788
10.1109/ICASSP.2017.7952155
10.1109/APSIPA.2016.7820736
10.1109/ICASSP.2017.7952154
10.1109/ICASSP.2015.7178964
10.1109/ICASSP.2014.6853860
10.21437/Interspeech.2018-1140
10.21437/Interspeech.2017-349
10.1109/GlobalSIP.2017.8309164
10.1109/ICASSP.2016.7471631
10.1109/ICASSP40776.2020.9054266
10.1109/SIU.2019.8806536
10.21437/Interspeech.2006-655
10.1109/MLSP.2018.8516918
10.1109/ICASSP.2015.7178061
ClassificationCodes TN912.3
ContentType Journal Article
Copyright Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2020
Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2020.
Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2020
– notice: Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2020.
– notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID AAYXX
CITATION
8FE
8FG
ABJCF
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L6V
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.1631/FITEE.2000019
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Engineering Collection
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
Engineering Collection
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
SciTech Premium Collection
ProQuest One Community College
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Computer Science Database


Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2095-9230
EndPage 1650
ExternalDocumentID zjdxxbc_e202011008
10_1631_FITEE_2000019
GrantInformation_xml – fundername: (Project supported by the Key Project of the National Natural Science Foundation of China); (the National Nat-ural Science Foundation of China); (the Qing Lan Talent Program of Jiangsu Province,China,and the Key Inno-vation Project of Undergraduate Students in Jiangsu Province,China)
  funderid: (Project supported by the Key Project of the National Natural Science Foundation of China); (the National Nat-ural Science Foundation of China); (the Qing Lan Talent Program of Jiangsu Province,China,and the Key Inno-vation Project of Undergraduate Students in Jiangsu Province,China)
GroupedDBID -EM
-SI
-S~
0R~
2KG
4.4
406
5VR
96X
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAXDM
AAYIU
AAYTO
AAYZH
AAZMS
ABAKF
ABDZT
ABECU
ABFTD
ABFTV
ABJCF
ABJNI
ABJOX
ABKCH
ABMQK
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACPIV
ACZOJ
ADINQ
ADKNI
ADKPE
ADRFC
ADURQ
ADYFF
ADZKW
AEBTG
AEFQL
AEGNC
AEJHL
AEJRE
AEMSY
AENEX
AEOHA
AESKC
AETCA
AEVLU
AEXYK
AFBBN
AFKRA
AFLOW
AFQWF
AFUIB
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALFXC
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMXSW
AMYLF
ANMIH
AOCGG
ARAPS
AXYYD
BENPR
BGLVJ
BGNMA
CAJEI
CCEZO
CCPQU
CHBEP
CUBFJ
CW9
DDRTE
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
FA0
FERAY
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FYJPI
GGCAI
GGRSB
HCIFZ
IKXTQ
IWAJR
J-C
JUIAU
JZLTJ
K7-
KOV
LLZTM
M4Y
M7S
NPVJJ
NQJWS
NU0
O9J
PT4
PTHSS
Q--
R-I
RLLFE
ROL
RSV
S..
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
TCJ
TGT
TSG
U1G
U5S
UG4
UOJIU
UTJUX
UZXMN
VFIZW
Z7R
Z7X
Z7Z
Z83
Z88
ZMTXR
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
AEZWR
AFDZB
AFHIU
AFOHR
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
8FE
8FG
ABRTQ
AZQEC
DWQXO
GNUQQ
JQ2
L6V
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
2B.
4A8
92I
93N
PMFND
PSX
ID FETCH-LOGICAL-c295t-e4744d39296021bde0ecfdea4f462fd433a899398490b235be38aec68a16a9ae3
IEDL.DBID AGYKE
ISSN 2095-9184
IngestDate Thu May 29 04:06:16 EDT 2025
Wed Aug 13 10:53:47 EDT 2025
Tue Jul 01 03:03:19 EDT 2025
Fri Feb 21 02:35:19 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords Deep learning
Speech separation
TN912.3
Generative factors
Autoencoder
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-e4744d39296021bde0ecfdea4f462fd433a899398490b235be38aec68a16a9ae3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-2968-0313
0000-0002-0616-4431
PQID 2918724199
PQPubID 2044401
PageCount 12
ParticipantIDs wanfang_journals_zjdxxbc_e202011008
proquest_journals_2918724199
crossref_primary_10_1631_FITEE_2000019
springer_journals_10_1631_FITEE_2000019
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-11-01
PublicationDateYYYYMMDD 2020-11-01
PublicationDate_xml – month: 11
  year: 2020
  text: 2020-11-01
  day: 01
PublicationDecade 2020
PublicationPlace Hangzhou
PublicationPlace_xml – name: Hangzhou
– name: Heidelberg
PublicationTitle Frontiers of information technology & electronic engineering
PublicationTitleAbbrev Front Inform Technol Electron Eng
PublicationTitle_FL Frontiers of Information Technology & Electronic Engineering
PublicationYear 2020
Publisher Zhejiang University Press
Springer Nature B.V
School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China%School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang 212013, China
Publisher_xml – name: Zhejiang University Press
– name: Springer Nature B.V
– name: School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China%School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
– name: Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang 212013, China
References Hyvärinen, Oja (CR15) 2000; 13
Gou, Yi, Zhang (CR9) 2018; 6
Luo, Mesgarani (CR18) 2019; 27
CR19
CR16
CR14
CR36
CR12
CR34
CR11
CR10
CR32
Ghahramani, Jordan (CR8) 1997; 29
Xia, Wang, Guo (CR35) 2019; 26
Zhang, Zhang (CR37) 2018; 12
Vincent, Gribonval, Fevotte (CR30) 2006; 14
Benesty, Chen, Huang (CR2) 2008
Garofolo, Lamel, Fisher (CR7) 1993
CR6
CR5
Qian, Weng, Chang (CR24) 2018; 19
Araki, Sawada, Mukai (CR1) 2007; 87
CR27
Bregman (CR3) 1990
CR26
Smaragdis (CR28) 2007; 15
CR25
CR23
van der Maaten, Hinton (CR29) 2008; 9
CR22
Nadas, Nahamoo, Picheny (CR20) 1989; 37
CR21
Wang, Brown (CR31) 2006
Wang, Narayanan, Wang (CR33) 2014; 22
Hu, Wang (CR13) 2013; 21
Brown, Cooke (CR4) 1994; 8
Kolbæk, Yu, Tan (CR17) 2017; 25
J Benesty (1599_CR2) 2008
1599_CR21
AS Bregman (1599_CR3) 1990
1599_CR23
K Hu (1599_CR13) 2013; 21
1599_CR22
YX Wang (1599_CR33) 2014; 22
1599_CR25
1599_CR27
1599_CR26
M Kolbæk (1599_CR17) 2017; 25
JP Gou (1599_CR9) 2018; 6
P Smaragdis (1599_CR28) 2007; 15
E Vincent (1599_CR30) 2006; 14
YM Qian (1599_CR24) 2018; 19
DL Wang (1599_CR31) 2006
A Nadas (1599_CR20) 1989; 37
L van der Maaten (1599_CR29) 2008; 9
1599_CR19
Z Ghahramani (1599_CR8) 1997; 29
1599_CR10
1599_CR32
JS Garofolo (1599_CR7) 1993
A Hyvärinen (1599_CR15) 2000; 13
QJ Zhang (1599_CR37) 2018; 12
S Araki (1599_CR1) 2007; 87
1599_CR12
1599_CR34
1599_CR11
LM Xia (1599_CR35) 2019; 26
1599_CR14
Y Luo (1599_CR18) 2019; 27
1599_CR36
1599_CR16
GJ Brown (1599_CR4) 1994; 8
1599_CR6
1599_CR5
References_xml – ident: CR22
– volume: 9
  start-page: 2579
  issue: 11
  year: 2008
  end-page: 2605
  ident: CR29
  article-title: Visualizing data using t-SNE
  publication-title: J Mach Learn Res
– volume: 13
  start-page: 411
  issue: 4–5
  year: 2000
  end-page: 430
  ident: CR15
  article-title: Independent component analysis: algorithms and applications
  publication-title: Neur Netw
  doi: 10.1016/S0893-6080(00)00026-5
– ident: CR14
– ident: CR16
– ident: CR12
– year: 2008
  ident: CR2
  publication-title: Microphone Array Signal Processing
– volume: 22
  start-page: 1849
  issue: 12
  year: 2014
  end-page: 1858
  ident: CR33
  article-title: On training targets for supervised speech separation
  publication-title: IEEE/ACM Trans Audio Speech Lang Process
  doi: 10.1109/TASLP.2014.2352935
– ident: CR10
– year: 2006
  ident: CR31
  publication-title: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
  doi: 10.1109/9780470043387
– volume: 25
  start-page: 1901
  issue: 10
  year: 2017
  end-page: 1913
  ident: CR17
  article-title: Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks
  publication-title: IEEE/ACM Trans Audio Speech Lang Process
  doi: 10.1109/TASLP.2017.2726762
– volume: 12
  start-page: 1140
  issue: 6
  year: 2018
  end-page: 1148
  ident: CR37
  article-title: Convolutional adaptive denoising autoencoders for hierarchical feature extraction
  publication-title: Front Comput Sci
  doi: 10.1007/s11704-016-6107-0
– volume: 26
  start-page: 2759
  issue: 10
  year: 2019
  end-page: 2770
  ident: CR35
  article-title: Gait recognition based on Wasserstein generating adversarial image inpainting network
  publication-title: J Cent South Univ
  doi: 10.1007/s11771-019-4211-7
– ident: CR6
– volume: 15
  start-page: 1
  issue: 1
  year: 2007
  end-page: 12
  ident: CR28
  article-title: Convolutive speech bases and their application to supervised speech separation
  publication-title: IEEE Trans Audio Speech Lang Process
  doi: 10.1109/TASL.2006.876726
– volume: 8
  start-page: 297
  issue: 4
  year: 1994
  end-page: 336
  ident: CR4
  article-title: Computational auditory scene analysis
  publication-title: Comput Speech Lang
  doi: 10.1006/csla.1994.1016
– ident: CR25
– ident: CR27
– year: 1990
  ident: CR3
  publication-title: Auditory Scene Analysis: the Perceptual Organization of Sound
  doi: 10.7551/mitpress/1486.001.0001
– volume: 29
  start-page: 245
  issue: 2–3
  year: 1997
  end-page: 273
  ident: CR8
  article-title: Factorial hidden Markov models
  publication-title: Mach Learn
  doi: 10.1023/A:1007425814087
– ident: CR23
– volume: 21
  start-page: 122
  issue: 1
  year: 2013
  end-page: 131
  ident: CR13
  article-title: An unsupervised approach to cochannel speech separation
  publication-title: IEEE Trans Audio Speech Lang Process
  doi: 10.1109/TASL.2012.2215591
– ident: CR21
– ident: CR19
– volume: 14
  start-page: 1462
  issue: 4
  year: 2006
  end-page: 1469
  ident: CR30
  article-title: Performance measurement in blind audio source separation
  publication-title: IEEE Trans Audio Speech Lang Process
  doi: 10.1109/TSA.2005.858005
– year: 1993
  ident: CR7
  publication-title: DARPA TIMIT Acoustic-Phonetic Continous Speech Corpus CD-ROM. NIST Speech Disc 1-1.1
  doi: 10.6028/NIST.IR.4930
– volume: 27
  start-page: 1256
  issue: 8
  year: 2019
  end-page: 1266
  ident: CR18
  article-title: Conv-TasNet: surpassing ideal time-frequency magnitude masking for speech separation
  publication-title: IEEE/ACM Trans Audio Speech Lang Process
  doi: 10.1109/TASLP.2019.2915167
– volume: 37
  start-page: 1495
  issue: 10
  year: 1989
  end-page: 1503
  ident: CR20
  article-title: Speech recognition using noise-adaptive prototypes
  publication-title: IEEE Trans Acoust Speech Signal Process
  doi: 10.1109/29.35387
– ident: CR11
– volume: 6
  start-page: 75748
  year: 2018
  end-page: 75766
  ident: CR9
  article-title: Sparsity and geometry preserving graph embedding for dimensionality reduction
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2884027
– volume: 87
  start-page: 1833
  issue: 8
  year: 2007
  end-page: 1847
  ident: CR1
  article-title: Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors
  publication-title: Signal Process
  doi: 10.1016/j.sigpro.2007.02.003
– ident: CR32
– volume: 19
  start-page: 40
  issue: 1
  year: 2018
  end-page: 63
  ident: CR24
  article-title: Past review, current progress, and challenges ahead on the cocktail party problem
  publication-title: Front Inform Technol Electron Eng
  doi: 10.1631/FITEE.1700814
– ident: CR34
– ident: CR36
– ident: CR5
– ident: CR26
– volume: 6
  start-page: 75748
  year: 2018
  ident: 1599_CR9
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2884027
– ident: 1599_CR21
  doi: 10.1109/ICASSP.2017.7951788
– ident: 1599_CR25
– volume: 8
  start-page: 297
  issue: 4
  year: 1994
  ident: 1599_CR4
  publication-title: Comput Speech Lang
  doi: 10.1006/csla.1994.1016
– volume: 12
  start-page: 1140
  issue: 6
  year: 2018
  ident: 1599_CR37
  publication-title: Front Comput Sci
  doi: 10.1007/s11704-016-6107-0
– ident: 1599_CR5
  doi: 10.1109/ICASSP.2017.7952155
– ident: 1599_CR32
  doi: 10.1109/APSIPA.2016.7820736
– ident: 1599_CR36
  doi: 10.1109/ICASSP.2017.7952154
– ident: 1599_CR22
  doi: 10.1109/ICASSP.2015.7178964
– volume-title: Microphone Array Signal Processing
  year: 2008
  ident: 1599_CR2
– volume: 27
  start-page: 1256
  issue: 8
  year: 2019
  ident: 1599_CR18
  publication-title: IEEE/ACM Trans Audio Speech Lang Process
  doi: 10.1109/TASLP.2019.2915167
– volume: 26
  start-page: 2759
  issue: 10
  year: 2019
  ident: 1599_CR35
  publication-title: J Cent South Univ
  doi: 10.1007/s11771-019-4211-7
– ident: 1599_CR14
  doi: 10.1109/ICASSP.2014.6853860
– ident: 1599_CR23
  doi: 10.21437/Interspeech.2018-1140
– volume: 14
  start-page: 1462
  issue: 4
  year: 2006
  ident: 1599_CR30
  publication-title: IEEE Trans Audio Speech Lang Process
  doi: 10.1109/TSA.2005.858005
– volume: 29
  start-page: 245
  issue: 2–3
  year: 1997
  ident: 1599_CR8
  publication-title: Mach Learn
  doi: 10.1023/A:1007425814087
– volume: 22
  start-page: 1849
  issue: 12
  year: 2014
  ident: 1599_CR33
  publication-title: IEEE/ACM Trans Audio Speech Lang Process
  doi: 10.1109/TASLP.2014.2352935
– volume: 19
  start-page: 40
  issue: 1
  year: 2018
  ident: 1599_CR24
  publication-title: Front Inform Technol Electron Eng
  doi: 10.1631/FITEE.1700814
– volume: 15
  start-page: 1
  issue: 1
  year: 2007
  ident: 1599_CR28
  publication-title: IEEE Trans Audio Speech Lang Process
  doi: 10.1109/TASL.2006.876726
– ident: 1599_CR12
  doi: 10.21437/Interspeech.2017-349
– volume: 25
  start-page: 1901
  issue: 10
  year: 2017
  ident: 1599_CR17
  publication-title: IEEE/ACM Trans Audio Speech Lang Process
  doi: 10.1109/TASLP.2017.2726762
– ident: 1599_CR10
  doi: 10.1109/GlobalSIP.2017.8309164
– ident: 1599_CR11
  doi: 10.1109/ICASSP.2016.7471631
– volume: 21
  start-page: 122
  issue: 1
  year: 2013
  ident: 1599_CR13
  publication-title: IEEE Trans Audio Speech Lang Process
  doi: 10.1109/TASL.2012.2215591
– volume: 9
  start-page: 2579
  issue: 11
  year: 2008
  ident: 1599_CR29
  publication-title: J Mach Learn Res
– ident: 1599_CR19
  doi: 10.1109/ICASSP40776.2020.9054266
– volume: 37
  start-page: 1495
  issue: 10
  year: 1989
  ident: 1599_CR20
  publication-title: IEEE Trans Acoust Speech Signal Process
  doi: 10.1109/29.35387
– ident: 1599_CR16
  doi: 10.1109/SIU.2019.8806536
– volume: 87
  start-page: 1833
  issue: 8
  year: 2007
  ident: 1599_CR1
  publication-title: Signal Process
  doi: 10.1016/j.sigpro.2007.02.003
– volume-title: DARPA TIMIT Acoustic-Phonetic Continous Speech Corpus CD-ROM. NIST Speech Disc 1-1.1
  year: 1993
  ident: 1599_CR7
  doi: 10.6028/NIST.IR.4930
– ident: 1599_CR27
  doi: 10.21437/Interspeech.2006-655
– volume-title: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
  year: 2006
  ident: 1599_CR31
  doi: 10.1109/9780470043387
– ident: 1599_CR34
  doi: 10.1109/MLSP.2018.8516918
– ident: 1599_CR26
– volume-title: Auditory Scene Analysis: the Perceptual Organization of Sound
  year: 1990
  ident: 1599_CR3
  doi: 10.7551/mitpress/1486.001.0001
– volume: 13
  start-page: 411
  issue: 4–5
  year: 2000
  ident: 1599_CR15
  publication-title: Neur Netw
  doi: 10.1016/S0893-6080(00)00026-5
– ident: 1599_CR6
  doi: 10.1109/ICASSP.2015.7178061
SSID ssj0001619798
Score 2.1643238
Snippet Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an...
TN912.3; Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders,...
SourceID wanfang
proquest
crossref
springer
SourceType Aggregation Database
Index Database
Publisher
StartPage 1639
SubjectTerms Communications Engineering
Computer Hardware
Computer Science
Computer Systems Organization and Communication Networks
Deep learning
Dictionaries
Electrical Engineering
Electronics and Microelectronics
Instrumentation
Microphones
Networks
Neural networks
Regularization
Separation
Speech
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT-MwELZ4XOCw4im6W5Al0HKySB3HcQ4IAWqFEFQrtEjcIjueFHFISwnail_P2HE2vcA5tqPMOPOy5_sIOVFZCZh2RIwXMmJCCWAK3TSzIjHZoFSRsK5R-H4sbx7F7VPytELGbS-Mu1bZ2kRvqO20cDXyM54NVIruJssuZq_MsUa509WWQkMHagV77iHGVsk6mmSF-379ajj-89BVXTBfSD1BLo88SaESAXhTxoOzEVqLoW9e8cg7y46qiz7_H5j6Np-q1NVkySONtsiPEErSy0b322QFqh2yuQQwuEvmdxhKVjVtSvTMtVW6q0F04sGmnaWjDeEODewRE4pBLMVv1Q6Og-IEKJ7pGzQI4dOKunvyE_rPF1TBsjBbv9dTh4hpYb5HHkfDv9c3LLAssIJnSc1ApEJYFyZJ9PfGQgRFaUGLUkheWhHHGnOyOFOoUcPjxECsNBRS6YF00N7xPlmrphUcEJrIUlsFSqeRRdvATQrSSmVKabgs0qRHfrcizWcNmEbukhCUfe5lnwfZ90i_FXge_qm3vNsBPXLaKqF7_MVCx0FH3cCPF7tYmCIHHvnoJ1I_v3_dL7LhhjYdiH2yVs_f4RBDkdochf31CUNH3eE
  priority: 102
  providerName: ProQuest
Title Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder
URI https://link.springer.com/article/10.1631/FITEE.2000019
https://www.proquest.com/docview/2918724199
https://d.wanfangdata.com.cn/periodical/zjdxxbc-e202011008
Volume 21
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fb9MwED6N7gUeGAwQg1FZAsGTt9RxHOdxRc0mflQTYtJ4iuz4UgRSirpUTPvrOTsOLSAkeMqDc1Zi39nf2XffAbzQRYPkdiRc1CrhUkvkmrZp7mRmi0mjE-l8ovD7uTq7kG8us8sdOB5yYUK0-3AlGVZqb9YqnRyXZMqzkFmSeJrP3WyiCz2C3ZPTT2-3jlXIIchDBVyRhCqEWkZmzT_6-HUn2sDLnzeiIY-nbUy72Npyyj04Hz62jzT5erTu7FF98xuP43_8zT24G-EnO-n15T7sYLsPe0NpBxYtfR_ubPEUPoDVO0Kkbcf6k37uszN9hBFbBM5qv2Cyvm4Pi0UoFoywMCMNN57Vg5EA1p_ZFfZE48uW-XD7BfsezmXR8Sht1t3SE2s6XD2Ei3L28fUZj8UaeC2KrOMocymdR1uKYIN1mGDdODSykUo0TqapIdcuLTQphhVpZjHVBmulzUR5hvD0EYzaZYuPgWWqMU6jNnniaIkRNkfllLaNskLVeXYAL4eJq771nByV92VoXKswrlUc1wM4HKa1iqZ5VQnSiZxwS0HNr4a52TT_paPnURM2L958cdfXtq5QJAFEJfrJP_f3FG57qT6n8RBG3WqNzwjcdHYMt3R5Oia1LqfT-TiqNz2ns_n5hx8QjPiX
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6V7QE4IJ7qlgKWeJ2sZh3HcQ4V4rGrLd2uEGql3oITTxb1kC27qVr4cfw2xo5D9gK3nhMnyngyD9vf9wG80lmF1HZEXJQq4lJL5JrSNLcyKbJRpSNpHVD4eK6mp_LzWXK2Bb87LIw7VtnFRB-o7bJ0a-T7IhvplNJNlr27-MGdapTbXe0kNEyQVrAHnmIsADuO8OcVtXDrg8NPNN-vhZiMTz5OeVAZ4KXIkoajTKW0rkxQlO8KixGWlUUjK6lEZWUcG-pJ4kzTFxUiTgqMtcFSaTNSjto6pufegm3pEK4D2P4wnn_52q_yUH-SekFeEXlRRC0D0aeKR_sTik5jD5bxTD-bibGvdv9u0HpYUV2ZerGRASf34V4oXdn71tcewBbWD-HuBqHhI1jNqHStG9ZuCXAH43RHkdjCk1u7yMpagR8W1CoWjIpmRrY1jv6D0QAsv7M1tozky5q5c_kLduUXcNHyMNpcNkvHwGlx9RhOb8TeT2BQL2vcAZaoyliN2qSRpVgkihSVVbqoVCFUmSZDeNOZNL9oyTty1_SQ7XNv-zzYfgh7ncHz8A-v897jhvC2m4T-8j8e9DLMUX_jr3N7fV2UOYrIV1uR3v3_617A7enJ8SyfHc6PnsIdN6xFP-7BoFld4jMqg5riefA1Bt9u2r3_AD2FGyU
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1RTxQxEJ6QIyH6IIIaQZQmEH0q9Lrdbjc-EeVAQeKDJJiYbNrt9Iwke-TYi4Rfb9vteifExPi828luO22_aef7BmBXlQ592MEoryWjQgmkym_T1IrclEOnmLCBKPzpTB6fi48X-cUSvO25MDHbvb-S7DgNQaWpafevrItTXGbD_ZGf1oeRZcKC5OeyYB73D2D54OjrycIRiw8OilgNl7NYkVCJpLJ5z8afu9Icav6-HY2cnsbpZryw_YxW4Vv_4V3WyeXerDV79e0dTcf__LPH8CjBUnLQ-dEaLGGzDqt9yQeSVoB1eLigX_gEpqc6mCbdDQANrM2QeUTGUcs6LKSkq-dDUnGKMfEYmXjP10Htg_gGWH8n19gJkE8aEtLwx-RnPK9FS1NrPWsnQXDT4vQpnI8Ov7w7pqmIA615mbcURSGEDShMejhhLDKsnUUtnJDcWZFl2od8Wam8wxie5QYzpbGWSg9lUA7PnsGgmTT4HEgunbYKlS6Y9UsPNwVKK5Vx0nBZF_kGvO4HsbrqtDqqEOP4fq1iv1apXzdgqx_iKk3Z64p7_yg8nin94zf9OM0f_8XQTvKK-Yu3P-zNjakr5CyCK6Y2_9neNqx8fj-qTj-cnbyAB8FAR3vcgkE7neFLj39a8yr5-S944ABr
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Latent+source-specific+generative+factor+learning+for+monaural+speech+separation+using+weighted-factor+autoencoder&rft.jtitle=Frontiers+of+information+technology+%26+electronic+engineering&rft.au=Chen%2C+Jing-jing&rft.au=Mao%2C+Qi-rong&rft.au=Qin%2C+You-cai&rft.au=Qian%2C+Shuang-qing&rft.date=2020-11-01&rft.issn=2095-9184&rft.eissn=2095-9230&rft.volume=21&rft.issue=11&rft.spage=1639&rft.epage=1650&rft_id=info:doi/10.1631%2FFITEE.2000019&rft.externalDBID=n%2Fa&rft.externalDocID=10_1631_FITEE_2000019
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fzjdxxbc-e%2Fzjdxxbc-e.jpg