A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training

Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training (CAPT) is attractive since the software can be used anywhere at any time, and repeated as many times as desired. In this paper, we introduce...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 23; no. 1; pp. 127 - 141
Main Authors	Su, Pei-Hao, Wu, Chuan-Hsun, Lee, Lin-Shan
Format	Journal Article
Language	English
Published	IEEE 01.01.2015
Subjects	Computer-aided pronunciation training (CAPT) computer-assisted language learning Computers dialogue game Games Markov decision process Markov processes reinforcement learning Software Speech Speech processing Training
Online Access	Get full text

Cover

Loading…

Abstract	Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training (CAPT) is attractive since the software can be used anywhere at any time, and repeated as many times as desired. In this paper, we introduce the immersive interaction scenario offered by spoken dialogues to CAPT by proposing a recursive dialogue game to make CAPT personalized. A number of tree-structured sub-dialogues are linked sequentially and recursively as the script for the game. The system policy at each dialogue turn is to select in real-time along the dialogue the best training sentence for each specific individual learner within the dialogue script, considering the learner's learning status and the future possible dialogue paths in the script, such that the learner can have the scores for all pronunciation units considered reaching a predefined standard in a minimum number of turns. The purpose here is that those pronunciation units poorly produced by the specific learner can be offered with more practice opportunities in the future sentences along the dialogue, which enables the learner to improve the pronunciation without having to repeat the same training sentences many times. This makes the learning process for each learner completely personalized. The dialogue policy is modeled by Markov decision process (MDP) with high-dimensional continuous state space, and trained with fitted value iteration using a huge number of simulated learners. These simulated leaners have the behavior similar to real learners, and were generated from a corpus of real learner data. The experiments demonstrated very promising results and a real cloud-based system is also successfully implemented.
AbstractList	Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training (CAPT) is attractive since the software can be used anywhere at any time, and repeated as many times as desired. In this paper, we introduce the immersive interaction scenario offered by spoken dialogues to CAPT by proposing a recursive dialogue game to make CAPT personalized. A number of tree-structured sub-dialogues are linked sequentially and recursively as the script for the game. The system policy at each dialogue turn is to select in real-time along the dialogue the best training sentence for each specific individual learner within the dialogue script, considering the learner's learning status and the future possible dialogue paths in the script, such that the learner can have the scores for all pronunciation units considered reaching a predefined standard in a minimum number of turns. The purpose here is that those pronunciation units poorly produced by the specific learner can be offered with more practice opportunities in the future sentences along the dialogue, which enables the learner to improve the pronunciation without having to repeat the same training sentences many times. This makes the learning process for each learner completely personalized. The dialogue policy is modeled by Markov decision process (MDP) with high-dimensional continuous state space, and trained with fitted value iteration using a huge number of simulated learners. These simulated leaners have the behavior similar to real learners, and were generated from a corpus of real learner data. The experiments demonstrated very promising results and a real cloud-based system is also successfully implemented.
Author	Lin-shan Lee Pei-hao Su Chuan-hsun Wu
Author_xml	– sequence: 1 givenname: Pei-Hao surname: Su fullname: Su, Pei-Hao – sequence: 2 givenname: Chuan-Hsun surname: Wu fullname: Wu, Chuan-Hsun – sequence: 3 givenname: Lin-Shan surname: Lee fullname: Lee, Lin-Shan
BookMark	eNo9kNFKAzEQRYNUsNb-gL7kB7ZOkt1k87hUrcKCi9bnJU1nS6RNStIV9Ovb2ioM3BmYcx_ONRn44JGQWwYTxkDfz6v3uplwYPmEC1UUil-QIRdcZ1pAPvjbuYYrMk7pEwAYKK1VPiRNRd_Q9jG5L6QPzqzDqkc6MxukXYi0wZiCN2v3g0s6DZttv8OYVW55OJsYfO-tMzsXPJ1H47zzqxty2Zl1wvE5R-Tj6XE-fc7q19nLtKozKxjsMluycgFGYmklF1YqZiwwU2qWm-XCduWiPH4UuZLGFlIfBjqLgoOS3NpCjAg_9doYUorYtdvoNiZ-twzao5b2V0t71NKetRyguxPkEPEfkFoxpguxB-IKYIs
CODEN	ITASD8
CitedBy_id	crossref_primary_10_1007_s40593_023_00337_2 crossref_primary_10_1109_ACCESS_2020_2988406 crossref_primary_10_1109_TCIAIG_2015_2512592 crossref_primary_10_16916_aded_395607 crossref_primary_10_1109_TASLP_2016_2635445 crossref_primary_10_3233_DS_200028
Cites_doi	10.1109/ICASSP.2007.367198 10.1109/ICASSP.2013.6639266 10.21437/Interspeech.2010-229 10.1109/SLT.2010.5700839 10.1002/9780470316887 10.1109/ICASSP.2012.6289040 10.3115/1614025.1614027 10.3115/1622064.1622097 10.6339/JDS.201104_09(2).0007 10.3115/976909.979652 10.1016/S0167-6393(99)00044-8 10.1017/S0269888906000944 10.1016/j.specom.2009.04.009 10.21437/Interspeech.2011-766 10.1006/jmps.1999.1276 10.1145/1390156.1390240 10.3115/1614164.1614171 10.1017/S0272263106060141 10.1109/JPROC.2012.2225812 10.1007/11874850_7 10.1109/ICASSP.2004.1326053 10.1109/SLT.2012.6424270 10.1017/CBO9780511667275 10.21437/Interspeech.2011-506 10.1017/S0958344004001120
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TASLP.2014.2375572
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2329-9304
EndPage	141
ExternalDocumentID	10_1109_TASLP_2014_2375572 6971195
Genre	orig-research
GroupedDBID	0R~ 4.4 6IK 97E AAJGR AAKMM AALFJ AASAJ AAWTV ABQJQ ABVLG ACIWK ACM ADBCU ADPZR AEBYY AENSD AFWIH AFWXC AIKLT AKJIK ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF EBS EJD GUFHI HGAVV IFIPE IPLJI JAVBF LHSKQ M43 OCL PQQKQ RIA RIE RNS ROL AAYXX CITATION
ID	FETCH-LOGICAL-c310t-c818b0a6e8c623c671ac01a8914adbcf8b8818b5476ac5695690fce320762cc53
IEDL.DBID	RIE
ISSN	2329-9290
IngestDate	Fri Aug 23 00:55:31 EDT 2024 Wed Jun 26 19:22:07 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c310t-c818b0a6e8c623c671ac01a8914adbcf8b8818b5476ac5695690fce320762cc53
PageCount	15
ParticipantIDs	ieee_primary_6971195 crossref_primary_10_1109_TASLP_2014_2375572
PublicationCentury	2000
PublicationDate	2015-01-01
PublicationDateYYYYMMDD	2015-01-01
PublicationDate_xml	– month: 01 year: 2015 text: 2015-01-01 day: 01
PublicationDecade	2010
PublicationTitle	IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev	TASLP
PublicationYear	2015
Publisher	IEEE
Publisher_xml	– name: IEEE
References	su (ref30) 2013 ref14 dempster (ref45) 1977; 39 ref10 hirose (ref47) 2011; 9 schatzmann (ref50) 2005 johnson (ref26) 2010; 20 riswanto (ref37) 2012; 2 chao (ref15) 2007 xu (ref17) 2011 ref51 yoshimura (ref35) 2007 ref46 suzuki (ref4) 2010 ref48 maxwell (ref38) 1997 ref41 ref43 ref49 xiong (ref11) 2013 ref7 ref5 chen (ref12) 2011 harrison (ref3) 2009 (ref18) 0 burnetas (ref39) 1995 (ref20) 0 raux (ref25) 2004 ref31 ref32 ref2 dekeyser (ref34) 2007 ref1 jurc1cek (ref52) 2011 hogg (ref44) 2005 puterman (ref29) 1994 bellman (ref28) 1957 wang (ref16) 2007 strik (ref8) 2009 strik (ref9) 2011 suzuki (ref6) 2010 ref24 heift (ref33) 2004 ref23 ref22 engel (ref42) 2003 (ref36) 0 misu (ref21) 2010 ref27 (ref19) 0 xu (ref13) 2012 singh (ref40) 1999
References_xml	– ident: ref2 doi: 10.1109/ICASSP.2007.367198 – ident: ref27 doi: 10.1109/ICASSP.2013.6639266 – year: 2009 ident: ref8 article-title: Developing a call system for practicing oral proficiency: How to design for speech technology, pedagogy and learners publication-title: Proc SLaTE contributor: fullname: strik – year: 2010 ident: ref4 article-title: Pronunciation proficiency estimation based on multilayer regression analysis using speaker-independent structural features publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2010-229 contributor: fullname: suzuki – year: 2011 ident: ref9 article-title: GOBL: Games online for basic language learning publication-title: Proc INTERSPEECH contributor: fullname: strik – year: 2011 ident: ref17 article-title: A generic framework for building dialogue games for language learning: Application in the flight domain publication-title: Proc SLaTE contributor: fullname: xu – ident: ref24 doi: 10.1109/SLT.2010.5700839 – year: 1997 ident: ref38 article-title: Role play and foreign language learning publication-title: Proc Annu Meting Jpn Assoc Lang Teachers contributor: fullname: maxwell – year: 1994 ident: ref29 publication-title: Markov Decision Processes Discrete Stochastic Dynamic Programming doi: 10.1002/9780470316887 contributor: fullname: puterman – ident: ref41 doi: 10.1109/ICASSP.2012.6289040 – year: 2007 ident: ref16 article-title: A spoken translation game for second language learning publication-title: Proc of AIED contributor: fullname: wang – volume: 20 start-page: 175 year: 2010 ident: ref26 article-title: Serious use of a serious game for language learning publication-title: Int J Artif Intell Educat contributor: fullname: johnson – ident: ref22 doi: 10.3115/1614025.1614027 – year: 2010 ident: ref6 article-title: Integration of multilayer regression analysis with structure-based pronunciation assessment publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2010-229 contributor: fullname: suzuki – ident: ref49 doi: 10.3115/1622064.1622097 – volume: 9 start-page: 243 year: 2011 ident: ref47 article-title: Bayesian information criterion and selection of the number of factors in factor analysis models publication-title: J Data Sci doi: 10.6339/JDS.201104_09(2).0007 contributor: fullname: hirose – ident: ref51 doi: 10.3115/976909.979652 – year: 1995 ident: ref39 article-title: Optimal adaptive policies for Markov decision processes publication-title: Math Operat Res contributor: fullname: burnetas – year: 1957 ident: ref28 publication-title: Dynamic Programming contributor: fullname: bellman – year: 2007 ident: ref15 article-title: An interactive interpretation game for learning Chinese publication-title: Proc SLaTE contributor: fullname: chao – ident: ref1 doi: 10.1016/S0167-6393(99)00044-8 – year: 2005 ident: ref44 publication-title: Introduction to Mathematical Statistics contributor: fullname: hogg – year: 2009 ident: ref3 article-title: Implementation of an extended recognition network for mispronun- ciation detection and diagnosis in computer-assisted pronunciation training publication-title: Proc SLaTE contributor: fullname: harrison – ident: ref48 doi: 10.1017/S0269888906000944 – year: 2005 ident: ref50 article-title: Effects of the user model on simulation-based learning of dialogue strategies publication-title: Proc ASRU contributor: fullname: schatzmann – year: 2013 ident: ref11 article-title: Automated content scoring of spoken responses containing multiple parts with factual information publication-title: Proc SLaTE contributor: fullname: xiong – year: 2013 ident: ref30 article-title: A recursive dialogue game framework with optimal policy offering personalized computer-assisted language learning publication-title: Proc INTERSPEECH contributor: fullname: su – year: 2007 ident: ref35 article-title: The effect of oral repetition on l2 speech fluency: An experimental tool and language tutor publication-title: Proc SLaTE contributor: fullname: yoshimura – ident: ref10 doi: 10.1016/j.specom.2009.04.009 – year: 2011 ident: ref52 article-title: Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2011-766 contributor: fullname: jurc1cek – ident: ref46 doi: 10.1006/jmps.1999.1276 – ident: ref43 doi: 10.1145/1390156.1390240 – ident: ref14 doi: 10.3115/1614164.1614171 – year: 2004 ident: ref25 article-title: Using task-oriented spoken dialogue systems for language learning: Potential, practical applications and challenges publication-title: Proc of InSTIL/ICALL Symposium contributor: fullname: raux – year: 2003 ident: ref42 article-title: Bayes meets Bellman: The Gaussian process approach to temporal difference learning publication-title: Proc ICML contributor: fullname: engel – ident: ref31 doi: 10.1017/S0272263106060141 – year: 0 ident: ref20 – year: 1999 ident: ref40 article-title: Reinforcement learning for spoken dialogue systems publication-title: Proc NIPS contributor: fullname: singh – ident: ref23 doi: 10.1109/JPROC.2012.2225812 – year: 2012 ident: ref13 publication-title: Language Technologies in Speech-enabled Second Language Learning Games From Reading to Dialogue contributor: fullname: xu – year: 0 ident: ref19 – volume: 39 start-page: 1 year: 1977 ident: ref45 article-title: Maximum likelihood from incomplete data via the em algorithm publication-title: J R Statist Soc Ser B (Methodol ) contributor: fullname: dempster – year: 0 ident: ref36 – volume: 2 start-page: 82 year: 2012 ident: ref37 article-title: Improving students? pronunciation through communicative drilling technique at senior high school (SMA) 07 South Bengkulu, Indonesia publication-title: International Journal of Human Social Science contributor: fullname: riswanto – ident: ref32 doi: 10.1007/11874850_7 – year: 2010 ident: ref21 article-title: Modeling spoken decision making dialogue and optimization of its dialogue strategy publication-title: SIGdial contributor: fullname: misu – ident: ref5 doi: 10.1109/ICASSP.2004.1326053 – ident: ref7 doi: 10.1109/SLT.2012.6424270 – year: 0 ident: ref18 – year: 2007 ident: ref34 publication-title: Practice in a second language Perspectives from applied linguistics and cognitive psychology doi: 10.1017/CBO9780511667275 contributor: fullname: dekeyser – year: 2011 ident: ref12 article-title: Applying rhythm features to automatically assess non-native speech publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2011-506 contributor: fullname: chen – start-page: 416 year: 2004 ident: ref33 article-title: An experimental study of effective feedback strategies for intelligent tutorial systems for foreign language publication-title: Proc ReCALL doi: 10.1017/S0958344004001120 contributor: fullname: heift
SSID	ssj0001079974
Score	2.108484
Snippet	Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training...
SourceID	crossref ieee
SourceType	Aggregation Database Publisher
StartPage	127
SubjectTerms	Computer-aided pronunciation training (CAPT) computer-assisted language learning Computers dialogue game Games Markov decision process Markov processes reinforcement learning Software Speech Speech processing Training
Title	A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training
URI	https://ieeexplore.ieee.org/document/6971195
Volume	23
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA_bTnrwNcX5Igdvmi5tk7Y5FnUOcVJ0g91KXoUhdjLay_56kz7mFA_empJA-L6Q75fv8fsAuFautvE3jrQmEhHtKsSJZctjAhNf-lIpG9GdvATjGXma03kH3G5qYbTWVfKZduxnFctXS1laV9kwYKFlKOuCboS9ulbr25-CQ8Yq0mWDERgyVh-3NTKYDafx23NiE7mI4_khpaH3ww5tNVap7MpoH0zaHdXpJO9OWQhHrn-RNf53ywdgrwGYMK5PxCHo6PwI7G7RDvZBEsNX62a3mevwflG7b-Aj_9DQQFiYtPh8rRVsuz6geKHMMFkt8zJv9AmnTX-JYzAbPUzvxqjprICkgXMFksZMC8wDHUkDf2QQulxil0fMJVwJmUUisjMoCQMuaWDeUAxnUvseNnenlNQ_Ab18metTAANOieQZyXwtjJIz5ikbzCVYEE8YNDQAN62c08-aQCOtHh6YpZVWUquVtNHKAPStDDczG_Gd_f37HOyYxbT2iFyAXrEq9aXBCIW4qg7HF5hzuFw
link.rule.ids	315,786,790,802,27955,27956,55107
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMgADr4IoTw9skNZJnKQeI6AUaKsIWqlb5FekCpGiKl3667HzKAUxsMWRZVl3lu_zPb4DuJa2MvE3ZilFhEWULS1GDFse5Zi4whVSmojuYOj3xuR54k1qcLuqhVFK5clnqmU-81i-nImFcZW1fRoYhrIN2NR2HgdFtda3RwUHlOa0yxolUEvbfVxVyWDaHoVv_cikcpGW4waeFzg_LNFaa5XcsnT3YFDtqUgoeW8tMt4Sy190jf_d9D7slhAThcWZOICaSg9hZ414sAFRiF6No93krqP7aeHAQY_sQyENYlFUIfSlkqjq-2CFU6mH0XyWLtJSo2hUdpg4gnH3YXTXs8reCpbQgC6zhDbUHDNfdYQGQMIPbCawzTrUJkxykXR4x8zwSOAz4fn6FUVxIpTrYH17CuG5x1BPZ6k6AeQzjwiWkMRVXKs5oY404VyCOXG4xkNNuKnkHH8WFBpx_vTANM61EhutxKVWmtAwMlzNLMV3-vfvK9jqjQb9uP80fDmDbb2QV_hHzqGezRfqQiOGjF_mB-ULgFK7sA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Recursive+Dialogue+Game+for+Personalized+Computer-Aided+Pronunciation+Training&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Su%2C+Pei-Hao&rft.au=Wu%2C+Chuan-Hsun&rft.au=Lee%2C+Lin-Shan&rft.date=2015-01-01&rft.issn=2329-9290&rft.eissn=2329-9304&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTASLP.2014.2375572&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TASLP_2014_2375572
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon