Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks

Separation of speech embedded in non-stationary interference is a challenging problem that has recently seen dramatic improvements using deep network-based methods. Previous work has shown that estimating a masking function to be applied to the noisy spectrum is a viable approach that can be improve...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 708 - 712
Main Authors Erdogan, Hakan, Hershey, John R., Watanabe, Shinji, Le Roux, Jonathan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2015
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Separation of speech embedded in non-stationary interference is a challenging problem that has recently seen dramatic improvements using deep network-based methods. Previous work has shown that estimating a masking function to be applied to the noisy spectrum is a viable approach that can be improved by using a signal-approximation based objective function. Better modeling of dynamics through deep recurrent networks has also been shown to improve performance. Here we pursue both of these directions. We develop a phase-sensitive objective function based on the signal-to-noise ratio (SNR) of the reconstructed signal, and show that in experiments it yields uniformly better results in terms of signal-to-distortion ratio (SDR). We also investigate improvements to the modeling of dynamics, using bidirectional recurrent networks, as well as by incorporating speech recognition outputs in the form of alignment vectors concatenated with the spectral input features. Both methods yield further improvements, pointing to tighter integration of recognition with separation as a promising future direction.
AbstractList Separation of speech embedded in non-stationary interference is a challenging problem that has recently seen dramatic improvements using deep network-based methods. Previous work has shown that estimating a masking function to be applied to the noisy spectrum is a viable approach that can be improved by using a signal-approximation based objective function. Better modeling of dynamics through deep recurrent networks has also been shown to improve performance. Here we pursue both of these directions. We develop a phase-sensitive objective function based on the signal-to-noise ratio (SNR) of the reconstructed signal, and show that in experiments it yields uniformly better results in terms of signal-to-distortion ratio (SDR). We also investigate improvements to the modeling of dynamics, using bidirectional recurrent networks, as well as by incorporating speech recognition outputs in the form of alignment vectors concatenated with the spectral input features. Both methods yield further improvements, pointing to tighter integration of recognition with separation as a promising future direction.
Author Hershey, John R.
Erdogan, Hakan
Le Roux, Jonathan
Watanabe, Shinji
Author_xml – sequence: 1
  givenname: Hakan
  surname: Erdogan
  fullname: Erdogan, Hakan
  email: haerdogan@sabanciuniv.edu
  organization: Mitsubishi Electr. Res. Labs. (MERL), Cambridge, MA, USA
– sequence: 2
  givenname: John R.
  surname: Hershey
  fullname: Hershey, John R.
  email: hershey@merl.com
  organization: Mitsubishi Electr. Res. Labs. (MERL), Cambridge, MA, USA
– sequence: 3
  givenname: Shinji
  surname: Watanabe
  fullname: Watanabe, Shinji
  email: watanabe@merl.com
  organization: Mitsubishi Electr. Res. Labs. (MERL), Cambridge, MA, USA
– sequence: 4
  givenname: Jonathan
  surname: Le Roux
  fullname: Le Roux, Jonathan
  email: leroux@merl.com
  organization: Mitsubishi Electr. Res. Labs. (MERL), Cambridge, MA, USA
BookMark eNotkNtKw0AYhFeoYFv7BL3ZF9j6b7KnXErxUChYqN5aNtk_bbTuht1E8e1NsFfDMMzwMTMy8cEjIUsOK86huNus7_f73SoDLleaawOKX5EZF0rnqii0npAplxkwxUVxQxYpfQAA10oLLabkfXeyCVlCn5qu-UZqvaMRq3D0gw-elSGkDh1NLWJ1oglbG-2Y0D41_kgdYjsW-hjRd9RjH-15kO4nxM90S65re064uOicvD0-vK6f2fblaQDfsiYH3bF8JHRGOymtyWuUNapSKQ0cBFbcmLKslausqW0pwQgUpRYqc85lQpoC8jlZ_u82iHhoY_Nl4-_hckf-B2T1WFA
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP.2015.7178061
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1467369977
9781467369978
EndPage 712
ExternalDocumentID 7178061
Genre orig-research
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i307t-31520d87d55a83fe5fe6b6670104ec188bbf6dca8fab5084e4b7462ddd2458903
IEDL.DBID RIE
ISSN 1520-6149
IngestDate Wed Aug 27 02:20:01 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i307t-31520d87d55a83fe5fe6b6670104ec188bbf6dca8fab5084e4b7462ddd2458903
PageCount 5
ParticipantIDs ieee_primary_7178061
PublicationCentury 2000
PublicationDate 20150401
PublicationDateYYYYMMDD 2015-04-01
PublicationDate_xml – month: 04
  year: 2015
  text: 20150401
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublicationTitleAbbrev ICASSP
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001767474
ssj0008748
Score 2.5010414
Snippet Separation of speech embedded in non-stationary interference is a challenging problem that has recently seen dramatic improvements using deep network-based...
SourceID ieee
SourceType Publisher
StartPage 708
SubjectTerms ASR
deep networks
Linear programming
LSTM
Noise measurement
Signal to noise ratio
Speech
Speech enhancement
Speech recognition
speech separation
Training
Title Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks
URI https://ieeexplore.ieee.org/document/7178061
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5qT3rx0Ypv9uDRpI8km81RiqUKSqEWerLsY9aKkhSTXvz1ziSxVfHgKSGQZdlsZuab_eYbxi77Ok5ASOeFiDUIoICnbdL3AmUiF8fCaFWqfT6I0TS8m0WzBrta18IAQEk-A59uy7N8m5kVpco6CD1kl7DOFgK3qlZrk08hVRoKZWorLOOycxa6J4JHYVIrDvW6Sed2cD2ZjInWFfn1kD96q5SuZbjL7r8mVTFKXv1VoX3z8Uuv8b-z3mPtTREfH6_d0z5rQHrAdr7pD7bY03iBTszLicROZo-r1PI1pShLPQzBKR3K8yWAWfAcKqXwLOXEl3_mFmBJL1QiT5zEMdUbXkpqed5m0-HN42Dk1Q0XvBf81Qu0x7haVsY2ipQMHEQOhBYiJswGpiel1k5Yo6RTGgO7EEIdh6Jvre2HkUy6wSFrplkKR4wb6bTFjQCBQTOhncaRgMITEWuFGOuYtWil5stKU2NeL9LJ349P2TZ9rYoxc8aaxfsKzjEYKPRFuQs-Aau4s8g
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3JTsMwEB2xHIALu9jxAY4pJY0d58CJRYUCqgRInCixPaEIlFakFYJv4Vf4N2aS0ALiisQpUaQs9oxm5jlvngG2fBNGqHTiBYQ1GKCgZ1zke7XYyiQMlTVxrvZ5rupXwcm1vB6Bt0EvDCLm5DOs8Gn-L991bJ-XynYIemjKPyWFsoEvzwTQsr3jA7Lmtu8fHV7u171yDwHvnry3RyFG-lWnQydlrGsJygSVUSpkGIJ2V2tjEuVsrJPYUK0SYGDCQPnOOT-QOqrW6LmjME51hvSL7rDhCg7r4HDxVMZ9HeZ7dfELCZAFUalxtFuNduiDLi6aTCSTlXIQ33ZzyZPZ0TS8f05DwWF5qPR7pmJffyhE_td5moGFYZuiaA4S8CyMYDoHU18UFufhptmmNO1lTNPnwC7i1IkBaaqTegQyeMFXZF1E2xYZFlronVRwR8CdcIhdvqGQsRIs_xk_0iEnz2cLcPUnw1yEsbST4hIIqxPjyNWxZikQmsTQk5ALMBWamFDkMsyzZVrdQjWkVRpl5ffLmzBRvzw7bZ0enzdWYZI9peAHrcFY76mP61T69MxG7oECbv_alB9QrREj
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Phase-sensitive+and+recognition-boosted+speech+separation+using+deep+recurrent+neural+networks&rft.au=Erdogan%2C+Hakan&rft.au=Hershey%2C+John+R.&rft.au=Watanabe%2C+Shinji&rft.au=Le+Roux%2C+Jonathan&rft.date=2015-04-01&rft.pub=IEEE&rft.issn=1520-6149&rft.spage=708&rft.epage=712&rft_id=info:doi/10.1109%2FICASSP.2015.7178061&rft.externalDocID=7178061
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon