A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training

Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training (CAPT) is attractive since the software can be used anywhere at any time, and repeated as many times as desired. In this paper, we introduce...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on audio, speech, and language processing Vol. 23; no. 1; pp. 127 - 141
Main Authors Su, Pei-Hao, Wu, Chuan-Hsun, Lee, Lin-Shan
Format Journal Article
LanguageEnglish
Published IEEE 01.01.2015
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training (CAPT) is attractive since the software can be used anywhere at any time, and repeated as many times as desired. In this paper, we introduce the immersive interaction scenario offered by spoken dialogues to CAPT by proposing a recursive dialogue game to make CAPT personalized. A number of tree-structured sub-dialogues are linked sequentially and recursively as the script for the game. The system policy at each dialogue turn is to select in real-time along the dialogue the best training sentence for each specific individual learner within the dialogue script, considering the learner's learning status and the future possible dialogue paths in the script, such that the learner can have the scores for all pronunciation units considered reaching a predefined standard in a minimum number of turns. The purpose here is that those pronunciation units poorly produced by the specific learner can be offered with more practice opportunities in the future sentences along the dialogue, which enables the learner to improve the pronunciation without having to repeat the same training sentences many times. This makes the learning process for each learner completely personalized. The dialogue policy is modeled by Markov decision process (MDP) with high-dimensional continuous state space, and trained with fitted value iteration using a huge number of simulated learners. These simulated leaners have the behavior similar to real learners, and were generated from a corpus of real learner data. The experiments demonstrated very promising results and a real cloud-based system is also successfully implemented.
AbstractList Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training (CAPT) is attractive since the software can be used anywhere at any time, and repeated as many times as desired. In this paper, we introduce the immersive interaction scenario offered by spoken dialogues to CAPT by proposing a recursive dialogue game to make CAPT personalized. A number of tree-structured sub-dialogues are linked sequentially and recursively as the script for the game. The system policy at each dialogue turn is to select in real-time along the dialogue the best training sentence for each specific individual learner within the dialogue script, considering the learner's learning status and the future possible dialogue paths in the script, such that the learner can have the scores for all pronunciation units considered reaching a predefined standard in a minimum number of turns. The purpose here is that those pronunciation units poorly produced by the specific learner can be offered with more practice opportunities in the future sentences along the dialogue, which enables the learner to improve the pronunciation without having to repeat the same training sentences many times. This makes the learning process for each learner completely personalized. The dialogue policy is modeled by Markov decision process (MDP) with high-dimensional continuous state space, and trained with fitted value iteration using a huge number of simulated learners. These simulated leaners have the behavior similar to real learners, and were generated from a corpus of real learner data. The experiments demonstrated very promising results and a real cloud-based system is also successfully implemented.
Author Lin-shan Lee
Pei-hao Su
Chuan-hsun Wu
Author_xml – sequence: 1
  givenname: Pei-Hao
  surname: Su
  fullname: Su, Pei-Hao
– sequence: 2
  givenname: Chuan-Hsun
  surname: Wu
  fullname: Wu, Chuan-Hsun
– sequence: 3
  givenname: Lin-Shan
  surname: Lee
  fullname: Lee, Lin-Shan
BookMark eNo9kNFKAzEQRYNUsNb-gL7kB7ZOkt1k87hUrcKCi9bnJU1nS6RNStIV9Ovb2ioM3BmYcx_ONRn44JGQWwYTxkDfz6v3uplwYPmEC1UUil-QIRdcZ1pAPvjbuYYrMk7pEwAYKK1VPiRNRd_Q9jG5L6QPzqzDqkc6MxukXYi0wZiCN2v3g0s6DZttv8OYVW55OJsYfO-tMzsXPJ1H47zzqxty2Zl1wvE5R-Tj6XE-fc7q19nLtKozKxjsMluycgFGYmklF1YqZiwwU2qWm-XCduWiPH4UuZLGFlIfBjqLgoOS3NpCjAg_9doYUorYtdvoNiZ-twzao5b2V0t71NKetRyguxPkEPEfkFoxpguxB-IKYIs
CODEN ITASD8
CitedBy_id crossref_primary_10_1007_s40593_023_00337_2
crossref_primary_10_1109_ACCESS_2020_2988406
crossref_primary_10_1109_TCIAIG_2015_2512592
crossref_primary_10_16916_aded_395607
crossref_primary_10_1109_TASLP_2016_2635445
crossref_primary_10_3233_DS_200028
Cites_doi 10.1109/ICASSP.2007.367198
10.1109/ICASSP.2013.6639266
10.21437/Interspeech.2010-229
10.1109/SLT.2010.5700839
10.1002/9780470316887
10.1109/ICASSP.2012.6289040
10.3115/1614025.1614027
10.3115/1622064.1622097
10.6339/JDS.201104_09(2).0007
10.3115/976909.979652
10.1016/S0167-6393(99)00044-8
10.1017/S0269888906000944
10.1016/j.specom.2009.04.009
10.21437/Interspeech.2011-766
10.1006/jmps.1999.1276
10.1145/1390156.1390240
10.3115/1614164.1614171
10.1017/S0272263106060141
10.1109/JPROC.2012.2225812
10.1007/11874850_7
10.1109/ICASSP.2004.1326053
10.1109/SLT.2012.6424270
10.1017/CBO9780511667275
10.21437/Interspeech.2011-506
10.1017/S0958344004001120
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TASLP.2014.2375572
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2329-9304
EndPage 141
ExternalDocumentID 10_1109_TASLP_2014_2375572
6971195
Genre orig-research
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AAKMM
AALFJ
AASAJ
AAWTV
ABQJQ
ABVLG
ACIWK
ACM
ADBCU
ADPZR
AEBYY
AENSD
AFWIH
AFWXC
AIKLT
AKJIK
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CCLIF
EBS
EJD
GUFHI
HGAVV
IFIPE
IPLJI
JAVBF
LHSKQ
M43
OCL
PQQKQ
RIA
RIE
RNS
ROL
AAYXX
CITATION
ID FETCH-LOGICAL-c310t-c818b0a6e8c623c671ac01a8914adbcf8b8818b5476ac5695690fce320762cc53
IEDL.DBID RIE
ISSN 2329-9290
IngestDate Fri Aug 23 00:55:31 EDT 2024
Wed Jun 26 19:22:07 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c310t-c818b0a6e8c623c671ac01a8914adbcf8b8818b5476ac5695690fce320762cc53
PageCount 15
ParticipantIDs ieee_primary_6971195
crossref_primary_10_1109_TASLP_2014_2375572
PublicationCentury 2000
PublicationDate 2015-01-01
PublicationDateYYYYMMDD 2015-01-01
PublicationDate_xml – month: 01
  year: 2015
  text: 2015-01-01
  day: 01
PublicationDecade 2010
PublicationTitle IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev TASLP
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
References su (ref30) 2013
ref14
dempster (ref45) 1977; 39
ref10
hirose (ref47) 2011; 9
schatzmann (ref50) 2005
johnson (ref26) 2010; 20
riswanto (ref37) 2012; 2
chao (ref15) 2007
xu (ref17) 2011
ref51
yoshimura (ref35) 2007
ref46
suzuki (ref4) 2010
ref48
maxwell (ref38) 1997
ref41
ref43
ref49
xiong (ref11) 2013
ref7
ref5
chen (ref12) 2011
harrison (ref3) 2009
(ref18) 0
burnetas (ref39) 1995
(ref20) 0
raux (ref25) 2004
ref31
ref32
ref2
dekeyser (ref34) 2007
ref1
jurc1cek (ref52) 2011
hogg (ref44) 2005
puterman (ref29) 1994
bellman (ref28) 1957
wang (ref16) 2007
strik (ref8) 2009
strik (ref9) 2011
suzuki (ref6) 2010
ref24
heift (ref33) 2004
ref23
ref22
engel (ref42) 2003
(ref36) 0
misu (ref21) 2010
ref27
(ref19) 0
xu (ref13) 2012
singh (ref40) 1999
References_xml – ident: ref2
  doi: 10.1109/ICASSP.2007.367198
– ident: ref27
  doi: 10.1109/ICASSP.2013.6639266
– year: 2009
  ident: ref8
  article-title: Developing a call system for practicing oral proficiency: How to design for speech technology, pedagogy and learners
  publication-title: Proc SLaTE
  contributor:
    fullname: strik
– year: 2010
  ident: ref4
  article-title: Pronunciation proficiency estimation based on multilayer regression analysis using speaker-independent structural features
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2010-229
  contributor:
    fullname: suzuki
– year: 2011
  ident: ref9
  article-title: GOBL: Games online for basic language learning
  publication-title: Proc INTERSPEECH
  contributor:
    fullname: strik
– year: 2011
  ident: ref17
  article-title: A generic framework for building dialogue games for language learning: Application in the flight domain
  publication-title: Proc SLaTE
  contributor:
    fullname: xu
– ident: ref24
  doi: 10.1109/SLT.2010.5700839
– year: 1997
  ident: ref38
  article-title: Role play and foreign language learning
  publication-title: Proc Annu Meting Jpn Assoc Lang Teachers
  contributor:
    fullname: maxwell
– year: 1994
  ident: ref29
  publication-title: Markov Decision Processes Discrete Stochastic Dynamic Programming
  doi: 10.1002/9780470316887
  contributor:
    fullname: puterman
– ident: ref41
  doi: 10.1109/ICASSP.2012.6289040
– year: 2007
  ident: ref16
  article-title: A spoken translation game for second language learning
  publication-title: Proc of AIED
  contributor:
    fullname: wang
– volume: 20
  start-page: 175
  year: 2010
  ident: ref26
  article-title: Serious use of a serious game for language learning
  publication-title: Int J Artif Intell Educat
  contributor:
    fullname: johnson
– ident: ref22
  doi: 10.3115/1614025.1614027
– year: 2010
  ident: ref6
  article-title: Integration of multilayer regression analysis with structure-based pronunciation assessment
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2010-229
  contributor:
    fullname: suzuki
– ident: ref49
  doi: 10.3115/1622064.1622097
– volume: 9
  start-page: 243
  year: 2011
  ident: ref47
  article-title: Bayesian information criterion and selection of the number of factors in factor analysis models
  publication-title: J Data Sci
  doi: 10.6339/JDS.201104_09(2).0007
  contributor:
    fullname: hirose
– ident: ref51
  doi: 10.3115/976909.979652
– year: 1995
  ident: ref39
  article-title: Optimal adaptive policies for Markov decision processes
  publication-title: Math Operat Res
  contributor:
    fullname: burnetas
– year: 1957
  ident: ref28
  publication-title: Dynamic Programming
  contributor:
    fullname: bellman
– year: 2007
  ident: ref15
  article-title: An interactive interpretation game for learning Chinese
  publication-title: Proc SLaTE
  contributor:
    fullname: chao
– ident: ref1
  doi: 10.1016/S0167-6393(99)00044-8
– year: 2005
  ident: ref44
  publication-title: Introduction to Mathematical Statistics
  contributor:
    fullname: hogg
– year: 2009
  ident: ref3
  article-title: Implementation of an extended recognition network for mispronun- ciation detection and diagnosis in computer-assisted pronunciation training
  publication-title: Proc SLaTE
  contributor:
    fullname: harrison
– ident: ref48
  doi: 10.1017/S0269888906000944
– year: 2005
  ident: ref50
  article-title: Effects of the user model on simulation-based learning of dialogue strategies
  publication-title: Proc ASRU
  contributor:
    fullname: schatzmann
– year: 2013
  ident: ref11
  article-title: Automated content scoring of spoken responses containing multiple parts with factual information
  publication-title: Proc SLaTE
  contributor:
    fullname: xiong
– year: 2013
  ident: ref30
  article-title: A recursive dialogue game framework with optimal policy offering personalized computer-assisted language learning
  publication-title: Proc INTERSPEECH
  contributor:
    fullname: su
– year: 2007
  ident: ref35
  article-title: The effect of oral repetition on l2 speech fluency: An experimental tool and language tutor
  publication-title: Proc SLaTE
  contributor:
    fullname: yoshimura
– ident: ref10
  doi: 10.1016/j.specom.2009.04.009
– year: 2011
  ident: ref52
  article-title: Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2011-766
  contributor:
    fullname: jurc1cek
– ident: ref46
  doi: 10.1006/jmps.1999.1276
– ident: ref43
  doi: 10.1145/1390156.1390240
– ident: ref14
  doi: 10.3115/1614164.1614171
– year: 2004
  ident: ref25
  article-title: Using task-oriented spoken dialogue systems for language learning: Potential, practical applications and challenges
  publication-title: Proc of InSTIL/ICALL Symposium
  contributor:
    fullname: raux
– year: 2003
  ident: ref42
  article-title: Bayes meets Bellman: The Gaussian process approach to temporal difference learning
  publication-title: Proc ICML
  contributor:
    fullname: engel
– ident: ref31
  doi: 10.1017/S0272263106060141
– year: 0
  ident: ref20
– year: 1999
  ident: ref40
  article-title: Reinforcement learning for spoken dialogue systems
  publication-title: Proc NIPS
  contributor:
    fullname: singh
– ident: ref23
  doi: 10.1109/JPROC.2012.2225812
– year: 2012
  ident: ref13
  publication-title: Language Technologies in Speech-enabled Second Language Learning Games From Reading to Dialogue
  contributor:
    fullname: xu
– year: 0
  ident: ref19
– volume: 39
  start-page: 1
  year: 1977
  ident: ref45
  article-title: Maximum likelihood from incomplete data via the em algorithm
  publication-title: J R Statist Soc Ser B (Methodol )
  contributor:
    fullname: dempster
– year: 0
  ident: ref36
– volume: 2
  start-page: 82
  year: 2012
  ident: ref37
  article-title: Improving students? pronunciation through communicative drilling technique at senior high school (SMA) 07 South Bengkulu, Indonesia
  publication-title: International Journal of Human Social Science
  contributor:
    fullname: riswanto
– ident: ref32
  doi: 10.1007/11874850_7
– year: 2010
  ident: ref21
  article-title: Modeling spoken decision making dialogue and optimization of its dialogue strategy
  publication-title: SIGdial
  contributor:
    fullname: misu
– ident: ref5
  doi: 10.1109/ICASSP.2004.1326053
– ident: ref7
  doi: 10.1109/SLT.2012.6424270
– year: 0
  ident: ref18
– year: 2007
  ident: ref34
  publication-title: Practice in a second language Perspectives from applied linguistics and cognitive psychology
  doi: 10.1017/CBO9780511667275
  contributor:
    fullname: dekeyser
– year: 2011
  ident: ref12
  article-title: Applying rhythm features to automatically assess non-native speech
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2011-506
  contributor:
    fullname: chen
– start-page: 416
  year: 2004
  ident: ref33
  article-title: An experimental study of effective feedback strategies for intelligent tutorial systems for foreign language
  publication-title: Proc ReCALL
  doi: 10.1017/S0958344004001120
  contributor:
    fullname: heift
SSID ssj0001079974
Score 2.108484
Snippet Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training...
SourceID crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 127
SubjectTerms Computer-aided pronunciation training (CAPT)
computer-assisted language learning
Computers
dialogue game
Games
Markov decision process
Markov processes
reinforcement learning
Software
Speech
Speech processing
Training
Title A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training
URI https://ieeexplore.ieee.org/document/6971195
Volume 23
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA_bTnrwNcX5Igdvmi5tk7Y5FnUOcVJ0g91KXoUhdjLay_56kz7mFA_empJA-L6Q75fv8fsAuFautvE3jrQmEhHtKsSJZctjAhNf-lIpG9GdvATjGXma03kH3G5qYbTWVfKZduxnFctXS1laV9kwYKFlKOuCboS9ulbr25-CQ8Yq0mWDERgyVh-3NTKYDafx23NiE7mI4_khpaH3ww5tNVap7MpoH0zaHdXpJO9OWQhHrn-RNf53ywdgrwGYMK5PxCHo6PwI7G7RDvZBEsNX62a3mevwflG7b-Aj_9DQQFiYtPh8rRVsuz6geKHMMFkt8zJv9AmnTX-JYzAbPUzvxqjprICkgXMFksZMC8wDHUkDf2QQulxil0fMJVwJmUUisjMoCQMuaWDeUAxnUvseNnenlNQ_Ab18metTAANOieQZyXwtjJIz5ikbzCVYEE8YNDQAN62c08-aQCOtHh6YpZVWUquVtNHKAPStDDczG_Gd_f37HOyYxbT2iFyAXrEq9aXBCIW4qg7HF5hzuFw
link.rule.ids 315,786,790,802,27955,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMgADr4IoTw9skNZJnKQeI6AUaKsIWqlb5FekCpGiKl3667HzKAUxsMWRZVl3lu_zPb4DuJa2MvE3ZilFhEWULS1GDFse5Zi4whVSmojuYOj3xuR54k1qcLuqhVFK5clnqmU-81i-nImFcZW1fRoYhrIN2NR2HgdFtda3RwUHlOa0yxolUEvbfVxVyWDaHoVv_cikcpGW4waeFzg_LNFaa5XcsnT3YFDtqUgoeW8tMt4Sy190jf_d9D7slhAThcWZOICaSg9hZ414sAFRiF6No93krqP7aeHAQY_sQyENYlFUIfSlkqjq-2CFU6mH0XyWLtJSo2hUdpg4gnH3YXTXs8reCpbQgC6zhDbUHDNfdYQGQMIPbCawzTrUJkxykXR4x8zwSOAz4fn6FUVxIpTrYH17CuG5x1BPZ6k6AeQzjwiWkMRVXKs5oY404VyCOXG4xkNNuKnkHH8WFBpx_vTANM61EhutxKVWmtAwMlzNLMV3-vfvK9jqjQb9uP80fDmDbb2QV_hHzqGezRfqQiOGjF_mB-ULgFK7sA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Recursive+Dialogue+Game+for+Personalized+Computer-Aided+Pronunciation+Training&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Su%2C+Pei-Hao&rft.au=Wu%2C+Chuan-Hsun&rft.au=Lee%2C+Lin-Shan&rft.date=2015-01-01&rft.issn=2329-9290&rft.eissn=2329-9304&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTASLP.2014.2375572&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TASLP_2014_2375572
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon