Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis

This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joi...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on audio, speech, and language processing Vol. 14; no. 4; pp. 1109 - 1116
Main Authors Wu, Chung-Hsien, Hsia, Chi-Chun, Liu, Te-Hsien, Wang, Jhing-Fa
Format Journal Article
LanguageEnglish
Published IEEE 01.07.2006
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis
AbstractList This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis
Author Te-Hsien Liu
Chung-Hsien Wu
Jhing-Fa Wang
Chi-Chun Hsia
Author_xml – sequence: 1
  givenname: Chung-Hsien
  surname: Wu
  fullname: Wu, Chung-Hsien
– sequence: 2
  givenname: Chi-Chun
  surname: Hsia
  fullname: Hsia, Chi-Chun
– sequence: 3
  givenname: Te-Hsien
  surname: Liu
  fullname: Liu, Te-Hsien
– sequence: 4
  givenname: Jhing-Fa
  surname: Wang
  fullname: Wang, Jhing-Fa
BookMark eNpFkMFLwzAUh4MouE3vgpfcPHUmTZomxzHUCRsKDq-laV5dZGtrXjvcf29LRU_v9-D7PXjflJxXdQWE3HA255yZ--3ibT2PGVNznSrO4zMy4Umio9TE8vwvc3VJpoifjEmhJJ-Q1_faF0CLujpCQF9XtENffVDXhbzt1wgOFpwDR62PVpsN0rIOFL6bAIj-CBQbgGJH8VS1O0CPV-SizPcI179zRraPD9vlKlq_PD0vF-uoEFq3kTBSc6mMMikDVTATg01SJWKb2DTPnbNSCa6MTKXLwVjLhCyV1A5kn5yYkbvxbBPqrw6wzQ4eC9jv8wrqDjOtjUhN_2JPspEsQo0YoMya4A95OGWcZYO6bFCXDeqyUV1fuR0rHgD-cTU4Y-IHt_lsWw
CODEN ITASD8
CitedBy_id crossref_primary_10_1016_j_specom_2014_10_005
crossref_primary_10_1007_s11767_006_0236_9
crossref_primary_10_1109_TASL_2012_2188628
crossref_primary_10_1007_s11432_013_4799_4
crossref_primary_10_1016_j_specom_2017_01_008
crossref_primary_10_1109_TASLP_2016_2537982
crossref_primary_10_1109_TASL_2006_889752
crossref_primary_10_1109_TASL_2008_2006578
crossref_primary_10_1007_s11042_015_3039_x
crossref_primary_10_1007_s10772_012_9145_5
crossref_primary_10_1109_TASL_2009_2034771
crossref_primary_10_1016_j_specom_2014_12_004
crossref_primary_10_1016_j_heliyon_2023_e15090
crossref_primary_10_1109_TASLP_2021_3066047
crossref_primary_10_1109_TASL_2012_2213247
crossref_primary_10_9746_jcmsi_2_365
crossref_primary_10_1080_02533839_2010_9671694
crossref_primary_10_1007_s11767_009_0003_9
crossref_primary_10_1186_1687_4722_2012_21
crossref_primary_10_1016_j_neucom_2007_08_010
crossref_primary_10_1016_j_specom_2008_09_006
crossref_primary_10_1109_TASLP_2014_2339738
crossref_primary_10_1109_TASLP_2013_2297018
crossref_primary_10_1016_j_artint_2009_11_011
crossref_primary_10_1109_TASLP_2019_2960721
Cites_doi 10.4159/harvard.9780674732469
10.3115/1075671.1075746
10.1109/TSA.2003.818114
10.1109/ICASSP.1988.196671
10.1109/89.759037
10.1121/1.405558
10.1109/89.496221
10.1109/ICASSP.1997.596185
10.1109/ICASSP.1998.674423
10.1016/S0885-2308(86)80009-2
10.1016/S0167-6393(02)00081-X
10.1109/89.661472
10.1109/89.232612
10.1109/ICASSP.2005.1415037
10.1093/ietisy/e88-d.3.502
10.1121/1.395275
10.1016/j.specom.2004.02.003
10.1016/S0167-6393(00)00075-3
10.1016/S0167-6393(98)00085-5
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TASL.2006.876112
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-7924
EndPage 1116
ExternalDocumentID 10_1109_TASL_2006_876112
1643640
Genre orig-research
GroupedDBID 0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
ABQJQ
ABVLG
AETIX
ALMA_UNASSIGNED_HOLDINGS
B-7
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
RIA
RIE
RIG
RNS
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c388t-394814696970e6c092eb57632b5b7aaddb463169474dae9bb034f648de4034d3
IEDL.DBID RIE
ISSN 1558-7916
IngestDate Fri Aug 16 22:50:57 EDT 2024
Fri Aug 23 03:24:21 EDT 2024
Wed Jun 26 19:20:42 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c388t-394814696970e6c092eb57632b5b7aaddb463169474dae9bb034f648de4034d3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 889379641
PQPubID 23500
PageCount 8
ParticipantIDs crossref_primary_10_1109_TASL_2006_876112
proquest_miscellaneous_889379641
ieee_primary_1643640
PublicationCentury 2000
PublicationDate 2006-07-01
PublicationDateYYYYMMDD 2006-07-01
PublicationDate_xml – month: 07
  year: 2006
  text: 2006-07-01
  day: 01
PublicationDecade 2000
PublicationTitle IEEE transactions on audio, speech, and language processing
PublicationTitleAbbrev TASL
PublicationYear 2006
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
shott (ref30) 1990
ref12
ref15
ref14
shuang (ref18) 2004
ref2
huang (ref25) 2001
ref17
hwang (ref21) 1996; 1
ref16
ref19
(ref27) 1993
kay (ref24) 1993
dempster (ref23) 1977; 39
kawanami (ref4) 2003
ref26
ref20
ref22
ref28
ref8
ref7
schröder schroder (ref1) 2001; 1
duxans (ref10) 2004
brown (ref31) 1973
kim (ref11) 1997; 5
eide (ref29) 2004
ref9
ref3
ref6
ref5
References_xml – year: 1990
  ident: ref30
  publication-title: Statistics for health professionals
  contributor:
    fullname: shott
– year: 1973
  ident: ref31
  publication-title: A First Language The Early Stages
  doi: 10.4159/harvard.9780674732469
  contributor:
    fullname: brown
– ident: ref13
  doi: 10.3115/1075671.1075746
– volume: 1
  start-page: 561
  year: 2001
  ident: ref1
  article-title: emotional speech synthesis&#821a review
  publication-title: Proc of Eurospeech
  contributor:
    fullname: schröder schroder
– volume: 39
  start-page: 1
  year: 1977
  ident: ref23
  article-title: maximum likelihood from incomplete data via the em algorithm
  publication-title: J R Statist Soc B
  contributor:
    fullname: dempster
– start-page: 2401
  year: 2003
  ident: ref4
  article-title: gmm-based voice conversion applied to emotional speech synthesis
  publication-title: Proc of Eurospeech
  contributor:
    fullname: kawanami
– start-page: 5
  year: 2004
  ident: ref10
  article-title: including dynamic and phonetic information in voice conversion systems
  publication-title: Proc ICSLP
  contributor:
    fullname: duxans
– ident: ref12
  doi: 10.1109/TSA.2003.818114
– year: 1993
  ident: ref27
  publication-title: The CKIP Categorical Classification of Mandarin Chinese (In Chinese)
– ident: ref6
  doi: 10.1109/ICASSP.1988.196671
– year: 1993
  ident: ref24
  publication-title: Fundamentals of Statistical Signal Processing Estimation Theory
  contributor:
    fullname: kay
– ident: ref17
  doi: 10.1109/89.759037
– ident: ref5
  doi: 10.1121/1.405558
– ident: ref26
  doi: 10.1109/89.496221
– ident: ref19
  doi: 10.1109/ICASSP.1997.596185
– ident: ref8
  doi: 10.1109/ICASSP.1998.674423
– ident: ref22
  doi: 10.1016/S0885-2308(86)80009-2
– ident: ref2
  doi: 10.1016/S0167-6393(02)00081-X
– start-page: 1197
  year: 2004
  ident: ref18
  article-title: a novel voice conversion system based on codebook mapping with phoneme-tied weighting
  publication-title: Proc ICSLP
  contributor:
    fullname: shuang
– ident: ref7
  doi: 10.1109/89.661472
– volume: 5
  start-page: 2519
  year: 1997
  ident: ref11
  article-title: hidden markov model-based voice conversion using dynamic characteristics of speaker
  publication-title: Proc EUROSPEECH
  contributor:
    fullname: kim
– ident: ref14
  doi: 10.1109/89.232612
– ident: ref9
  doi: 10.1109/ICASSP.2005.1415037
– ident: ref3
  doi: 10.1093/ietisy/e88-d.3.502
– ident: ref16
  doi: 10.1121/1.395275
– start-page: 79
  year: 2004
  ident: ref29
  article-title: a corpus-based approach to expressive speech synthesis
  publication-title: Proc 5th ISCA Speech Synthesis Workshop
  contributor:
    fullname: eide
– year: 2001
  ident: ref25
  publication-title: Spoken Language Processing A Guide to Theory Algorithm and System Development
  contributor:
    fullname: huang
– ident: ref28
  doi: 10.1016/j.specom.2004.02.003
– volume: 1
  start-page: 87
  year: 1996
  ident: ref21
  article-title: a mandarin text-to-speech system
  publication-title: Int J Computat Linguis Chin Lang Process
  contributor:
    fullname: hwang
– ident: ref15
  doi: 10.1016/S0167-6393(00)00075-3
– ident: ref20
  doi: 10.1016/S0167-6393(98)00085-5
SSID ssj0043641
Score 2.1754017
Snippet This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis....
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 1109
SubjectTerms Algorithm design and analysis
Bi-HMM voice conversion
Computer science
Conversion
Decision trees
embedded duration model
expressive speech synthesis
Hidden Markov models
Humans
Mathematical models
Natural language processing
prosody conversion
Signal synthesis
Spatial databases
Speech
Speech analysis
Speech recognition
Speech synthesis
Testing
Trains
Voice
Title Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
URI https://ieeexplore.ieee.org/document/1643640
https://search.proquest.com/docview/889379641
Volume 14
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED91fYKHdXRMKwPkB14mkc5JHCd-rCamClE0aR3qWxTb1zEh0oo0k7a_nrPdlDH2wJul2MrJZ9-H73d3AB-MlnJpFI-4tlkkbJ5EeulAADqv4tjGeW5dcvLsq5xei8-LbNGDj7tcGET04DMcu6GP5duVad1T2RmZ9qkU5KDvFTwJuVqd1HUfQm3UrHAlGHchSa7O5pOrLyHsQFc_jpO_VJDvqfKPIPba5WIAs46uACr5MW43emwenpRs_F_CD2B_a2aySTgXr6CH9RAGXQsHtr3RQ3j5qB7hIVx-W5HcYB6J7p_RmIPF3zDbhnMS4U-NJKks07fRdDZrGJm8jAjyYNo7ZM0a0XxnzX1NdmVz27yG-cWn-fk02rZciExaFJsodcVbyGOWKucoDVcJavJI0kRnxDyShVrINJZK5MJWqLTmqVhKUVgUNLLpEfTrVY3HwCohZVIpbkn5icwkFU2LjcJEFTIzUo3gtGNCuQ6FNUrvkHBVOoa5_piyDAwbwaHb0z_zwnaOgHVcK-lSuEhHVeOqbcrCWWGKfvjm-ZUn8CI8pTjY7Vvob361-I6Mi41-70_Vb86Dy9w
link.rule.ids 315,786,790,802,27955,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4hOLQ9AIVWLBTqQy9IzeIkjhMfUQXathtUiS3iFsX2bIsQWUQ2SPDrGdsbSqGH3izFVkYez8vzeQbgk9FSTo3iEdc2i4TNk0hPHQhA53Uc2zjPrXucXJ7I0U_x7Tw7X4LPj29hENGDz3Dohj6Xb2emc1dlB-Tap1JQgL5Cdp7n4bVWr3fdp1AdNStcEcbHpCRXB5PD03FIPJDwx3HylxHyXVVeqGJvX47XoOwpC7CSy2E310Nz_6xo4_-Svg6rC0eTHYaT8RaWsNmAtb6JA1vI9Aa8eVKRcBN-nM1IczCPRfcXacwB438x24WTEuGVRtJVlumLaFSWLSOnlxFBHk57i6y9RjS_WXvXkGfZXrTvYHJ8NPkyihZNFyKTFsU8Sl35FoqZpco5SsNVgppikjTRGbGPtKEWMo2lErmwNSqteSqmUhQWBY1s-h6Wm1mDW8BqIWVSK27J_InMJDVNi43CRBUyM1INYL9nQnUdSmtUPiThqnIMcx0yZRUYNoBNt6d_5oXtHADruVaRWLhcR93grGurwvlhin64_e-VH-HVaFKOq_HXk-878DpcrDgQ7gdYnt90uEuuxlzv-RP2ALevzzA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Voice+conversion+using+duration-embedded+bi-HMMs+for+expressive+speech+synthesis&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Chung-Hsien+Wu&rft.au=Chi-Chun+Hsia&rft.au=Te-Hsien+Liu&rft.au=Jhing-Fa+Wang&rft.date=2006-07-01&rft.pub=IEEE&rft.issn=1558-7916&rft.eissn=1558-7924&rft.volume=14&rft.issue=4&rft.spage=1109&rft.epage=1116&rft_id=info:doi/10.1109%2FTASL.2006.876112&rft.externalDocID=1643640
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon