Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joi...
Saved in:
Published in | IEEE transactions on audio, speech, and language processing Vol. 14; no. 4; pp. 1109 - 1116 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.07.2006
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis |
---|---|
AbstractList | This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis |
Author | Te-Hsien Liu Chung-Hsien Wu Jhing-Fa Wang Chi-Chun Hsia |
Author_xml | – sequence: 1 givenname: Chung-Hsien surname: Wu fullname: Wu, Chung-Hsien – sequence: 2 givenname: Chi-Chun surname: Hsia fullname: Hsia, Chi-Chun – sequence: 3 givenname: Te-Hsien surname: Liu fullname: Liu, Te-Hsien – sequence: 4 givenname: Jhing-Fa surname: Wang fullname: Wang, Jhing-Fa |
BookMark | eNpFkMFLwzAUh4MouE3vgpfcPHUmTZomxzHUCRsKDq-laV5dZGtrXjvcf29LRU_v9-D7PXjflJxXdQWE3HA255yZ--3ibT2PGVNznSrO4zMy4Umio9TE8vwvc3VJpoifjEmhJJ-Q1_faF0CLujpCQF9XtENffVDXhbzt1wgOFpwDR62PVpsN0rIOFL6bAIj-CBQbgGJH8VS1O0CPV-SizPcI179zRraPD9vlKlq_PD0vF-uoEFq3kTBSc6mMMikDVTATg01SJWKb2DTPnbNSCa6MTKXLwVjLhCyV1A5kn5yYkbvxbBPqrw6wzQ4eC9jv8wrqDjOtjUhN_2JPspEsQo0YoMya4A95OGWcZYO6bFCXDeqyUV1fuR0rHgD-cTU4Y-IHt_lsWw |
CODEN | ITASD8 |
CitedBy_id | crossref_primary_10_1016_j_specom_2014_10_005 crossref_primary_10_1007_s11767_006_0236_9 crossref_primary_10_1109_TASL_2012_2188628 crossref_primary_10_1007_s11432_013_4799_4 crossref_primary_10_1016_j_specom_2017_01_008 crossref_primary_10_1109_TASLP_2016_2537982 crossref_primary_10_1109_TASL_2006_889752 crossref_primary_10_1109_TASL_2008_2006578 crossref_primary_10_1007_s11042_015_3039_x crossref_primary_10_1007_s10772_012_9145_5 crossref_primary_10_1109_TASL_2009_2034771 crossref_primary_10_1016_j_specom_2014_12_004 crossref_primary_10_1016_j_heliyon_2023_e15090 crossref_primary_10_1109_TASLP_2021_3066047 crossref_primary_10_1109_TASL_2012_2213247 crossref_primary_10_9746_jcmsi_2_365 crossref_primary_10_1080_02533839_2010_9671694 crossref_primary_10_1007_s11767_009_0003_9 crossref_primary_10_1186_1687_4722_2012_21 crossref_primary_10_1016_j_neucom_2007_08_010 crossref_primary_10_1016_j_specom_2008_09_006 crossref_primary_10_1109_TASLP_2014_2339738 crossref_primary_10_1109_TASLP_2013_2297018 crossref_primary_10_1016_j_artint_2009_11_011 crossref_primary_10_1109_TASLP_2019_2960721 |
Cites_doi | 10.4159/harvard.9780674732469 10.3115/1075671.1075746 10.1109/TSA.2003.818114 10.1109/ICASSP.1988.196671 10.1109/89.759037 10.1121/1.405558 10.1109/89.496221 10.1109/ICASSP.1997.596185 10.1109/ICASSP.1998.674423 10.1016/S0885-2308(86)80009-2 10.1016/S0167-6393(02)00081-X 10.1109/89.661472 10.1109/89.232612 10.1109/ICASSP.2005.1415037 10.1093/ietisy/e88-d.3.502 10.1121/1.395275 10.1016/j.specom.2004.02.003 10.1016/S0167-6393(00)00075-3 10.1016/S0167-6393(98)00085-5 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TASL.2006.876112 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Computer and Information Systems Abstracts |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1558-7924 |
EndPage | 1116 |
ExternalDocumentID | 10_1109_TASL_2006_876112 1643640 |
Genre | orig-research |
GroupedDBID | 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ ABQJQ ABVLG AETIX ALMA_UNASSIGNED_HOLDINGS B-7 BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RIG RNS AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c388t-394814696970e6c092eb57632b5b7aaddb463169474dae9bb034f648de4034d3 |
IEDL.DBID | RIE |
ISSN | 1558-7916 |
IngestDate | Fri Aug 16 22:50:57 EDT 2024 Fri Aug 23 03:24:21 EDT 2024 Wed Jun 26 19:20:42 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c388t-394814696970e6c092eb57632b5b7aaddb463169474dae9bb034f648de4034d3 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
PQID | 889379641 |
PQPubID | 23500 |
PageCount | 8 |
ParticipantIDs | crossref_primary_10_1109_TASL_2006_876112 proquest_miscellaneous_889379641 ieee_primary_1643640 |
PublicationCentury | 2000 |
PublicationDate | 2006-07-01 |
PublicationDateYYYYMMDD | 2006-07-01 |
PublicationDate_xml | – month: 07 year: 2006 text: 2006-07-01 day: 01 |
PublicationDecade | 2000 |
PublicationTitle | IEEE transactions on audio, speech, and language processing |
PublicationTitleAbbrev | TASL |
PublicationYear | 2006 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref13 shott (ref30) 1990 ref12 ref15 ref14 shuang (ref18) 2004 ref2 huang (ref25) 2001 ref17 hwang (ref21) 1996; 1 ref16 ref19 (ref27) 1993 kay (ref24) 1993 dempster (ref23) 1977; 39 kawanami (ref4) 2003 ref26 ref20 ref22 ref28 ref8 ref7 schröder schroder (ref1) 2001; 1 duxans (ref10) 2004 brown (ref31) 1973 kim (ref11) 1997; 5 eide (ref29) 2004 ref9 ref3 ref6 ref5 |
References_xml | – year: 1990 ident: ref30 publication-title: Statistics for health professionals contributor: fullname: shott – year: 1973 ident: ref31 publication-title: A First Language The Early Stages doi: 10.4159/harvard.9780674732469 contributor: fullname: brown – ident: ref13 doi: 10.3115/1075671.1075746 – volume: 1 start-page: 561 year: 2001 ident: ref1 article-title: emotional speech synthesis̵a review publication-title: Proc of Eurospeech contributor: fullname: schröder schroder – volume: 39 start-page: 1 year: 1977 ident: ref23 article-title: maximum likelihood from incomplete data via the em algorithm publication-title: J R Statist Soc B contributor: fullname: dempster – start-page: 2401 year: 2003 ident: ref4 article-title: gmm-based voice conversion applied to emotional speech synthesis publication-title: Proc of Eurospeech contributor: fullname: kawanami – start-page: 5 year: 2004 ident: ref10 article-title: including dynamic and phonetic information in voice conversion systems publication-title: Proc ICSLP contributor: fullname: duxans – ident: ref12 doi: 10.1109/TSA.2003.818114 – year: 1993 ident: ref27 publication-title: The CKIP Categorical Classification of Mandarin Chinese (In Chinese) – ident: ref6 doi: 10.1109/ICASSP.1988.196671 – year: 1993 ident: ref24 publication-title: Fundamentals of Statistical Signal Processing Estimation Theory contributor: fullname: kay – ident: ref17 doi: 10.1109/89.759037 – ident: ref5 doi: 10.1121/1.405558 – ident: ref26 doi: 10.1109/89.496221 – ident: ref19 doi: 10.1109/ICASSP.1997.596185 – ident: ref8 doi: 10.1109/ICASSP.1998.674423 – ident: ref22 doi: 10.1016/S0885-2308(86)80009-2 – ident: ref2 doi: 10.1016/S0167-6393(02)00081-X – start-page: 1197 year: 2004 ident: ref18 article-title: a novel voice conversion system based on codebook mapping with phoneme-tied weighting publication-title: Proc ICSLP contributor: fullname: shuang – ident: ref7 doi: 10.1109/89.661472 – volume: 5 start-page: 2519 year: 1997 ident: ref11 article-title: hidden markov model-based voice conversion using dynamic characteristics of speaker publication-title: Proc EUROSPEECH contributor: fullname: kim – ident: ref14 doi: 10.1109/89.232612 – ident: ref9 doi: 10.1109/ICASSP.2005.1415037 – ident: ref3 doi: 10.1093/ietisy/e88-d.3.502 – ident: ref16 doi: 10.1121/1.395275 – start-page: 79 year: 2004 ident: ref29 article-title: a corpus-based approach to expressive speech synthesis publication-title: Proc 5th ISCA Speech Synthesis Workshop contributor: fullname: eide – year: 2001 ident: ref25 publication-title: Spoken Language Processing A Guide to Theory Algorithm and System Development contributor: fullname: huang – ident: ref28 doi: 10.1016/j.specom.2004.02.003 – volume: 1 start-page: 87 year: 1996 ident: ref21 article-title: a mandarin text-to-speech system publication-title: Int J Computat Linguis Chin Lang Process contributor: fullname: hwang – ident: ref15 doi: 10.1016/S0167-6393(00)00075-3 – ident: ref20 doi: 10.1016/S0167-6393(98)00085-5 |
SSID | ssj0043641 |
Score | 2.1754017 |
Snippet | This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis.... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 1109 |
SubjectTerms | Algorithm design and analysis Bi-HMM voice conversion Computer science Conversion Decision trees embedded duration model expressive speech synthesis Hidden Markov models Humans Mathematical models Natural language processing prosody conversion Signal synthesis Spatial databases Speech Speech analysis Speech recognition Speech synthesis Testing Trains Voice |
Title | Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis |
URI | https://ieeexplore.ieee.org/document/1643640 https://search.proquest.com/docview/889379641 |
Volume | 14 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED91fYKHdXRMKwPkB14mkc5JHCd-rCamClE0aR3qWxTb1zEh0oo0k7a_nrPdlDH2wJul2MrJZ9-H73d3AB-MlnJpFI-4tlkkbJ5EeulAADqv4tjGeW5dcvLsq5xei8-LbNGDj7tcGET04DMcu6GP5duVad1T2RmZ9qkU5KDvFTwJuVqd1HUfQm3UrHAlGHchSa7O5pOrLyHsQFc_jpO_VJDvqfKPIPba5WIAs46uACr5MW43emwenpRs_F_CD2B_a2aySTgXr6CH9RAGXQsHtr3RQ3j5qB7hIVx-W5HcYB6J7p_RmIPF3zDbhnMS4U-NJKks07fRdDZrGJm8jAjyYNo7ZM0a0XxnzX1NdmVz27yG-cWn-fk02rZciExaFJsodcVbyGOWKucoDVcJavJI0kRnxDyShVrINJZK5MJWqLTmqVhKUVgUNLLpEfTrVY3HwCohZVIpbkn5icwkFU2LjcJEFTIzUo3gtGNCuQ6FNUrvkHBVOoa5_piyDAwbwaHb0z_zwnaOgHVcK-lSuEhHVeOqbcrCWWGKfvjm-ZUn8CI8pTjY7Vvob361-I6Mi41-70_Vb86Dy9w |
link.rule.ids | 315,786,790,802,27955,27956,55107 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4hOLQ9AIVWLBTqQy9IzeIkjhMfUQXathtUiS3iFsX2bIsQWUQ2SPDrGdsbSqGH3izFVkYez8vzeQbgk9FSTo3iEdc2i4TNk0hPHQhA53Uc2zjPrXucXJ7I0U_x7Tw7X4LPj29hENGDz3Dohj6Xb2emc1dlB-Tap1JQgL5Cdp7n4bVWr3fdp1AdNStcEcbHpCRXB5PD03FIPJDwx3HylxHyXVVeqGJvX47XoOwpC7CSy2E310Nz_6xo4_-Svg6rC0eTHYaT8RaWsNmAtb6JA1vI9Aa8eVKRcBN-nM1IczCPRfcXacwB438x24WTEuGVRtJVlumLaFSWLSOnlxFBHk57i6y9RjS_WXvXkGfZXrTvYHJ8NPkyihZNFyKTFsU8Sl35FoqZpco5SsNVgppikjTRGbGPtKEWMo2lErmwNSqteSqmUhQWBY1s-h6Wm1mDW8BqIWVSK27J_InMJDVNi43CRBUyM1INYL9nQnUdSmtUPiThqnIMcx0yZRUYNoBNt6d_5oXtHADruVaRWLhcR93grGurwvlhin64_e-VH-HVaFKOq_HXk-878DpcrDgQ7gdYnt90uEuuxlzv-RP2ALevzzA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Voice+conversion+using+duration-embedded+bi-HMMs+for+expressive+speech+synthesis&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Chung-Hsien+Wu&rft.au=Chi-Chun+Hsia&rft.au=Te-Hsien+Liu&rft.au=Jhing-Fa+Wang&rft.date=2006-07-01&rft.pub=IEEE&rft.issn=1558-7916&rft.eissn=1558-7924&rft.volume=14&rft.issue=4&rft.spage=1109&rft.epage=1116&rft_id=info:doi/10.1109%2FTASL.2006.876112&rft.externalDocID=1643640 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon |