Hallucinated n-best lists for discriminative language modeling
This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong ba...
Saved in:
Published in | 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5001 - 5004 |
---|---|
Main Authors | , , , , , , , , , , , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with "real" n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods. |
---|---|
AbstractList | This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with "real" n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods. |
Author | Xu, P. Cao, Y. Karakos, D. Koehn, P. Prud'hommeaux, E. Khudanpur, S. Shafran, I. Hasler, E. Lopez, A. Glenn, N. Hall, K. Bikel, D. Post, M. Saraclar, M. Roark, B. Riley, D. Sagae, K. Lehr, M. Callison-Burch, C. |
Author_xml | – sequence: 1 givenname: K. surname: Sagae fullname: Sagae, K. – sequence: 2 givenname: M. surname: Lehr fullname: Lehr, M. – sequence: 3 givenname: E. surname: Prud'hommeaux fullname: Prud'hommeaux, E. – sequence: 4 givenname: P. surname: Xu fullname: Xu, P. – sequence: 5 givenname: N. surname: Glenn fullname: Glenn, N. – sequence: 6 givenname: D. surname: Karakos fullname: Karakos, D. – sequence: 7 givenname: S. surname: Khudanpur fullname: Khudanpur, S. – sequence: 8 givenname: B. surname: Roark fullname: Roark, B. – sequence: 9 givenname: M. surname: Saraclar fullname: Saraclar, M. – sequence: 10 givenname: I. surname: Shafran fullname: Shafran, I. – sequence: 11 givenname: D. surname: Bikel fullname: Bikel, D. – sequence: 12 givenname: C. surname: Callison-Burch fullname: Callison-Burch, C. – sequence: 13 givenname: Y. surname: Cao fullname: Cao, Y. – sequence: 14 givenname: K. surname: Hall fullname: Hall, K. – sequence: 15 givenname: E. surname: Hasler fullname: Hasler, E. – sequence: 16 givenname: P. surname: Koehn fullname: Koehn, P. – sequence: 17 givenname: A. surname: Lopez fullname: Lopez, A. – sequence: 18 givenname: M. surname: Post fullname: Post, M. – sequence: 19 givenname: D. surname: Riley fullname: Riley, D. |
BookMark | eNo1kFFLwzAcxKNOsJv9BHvJF0j9J2mS5kWQoU4YKEzBt5E0SYlkrSyd4Le34ryXezh-x3FzNOuH3iO0pFBRCvrmaXW33b5UDCirJGs01PwMlVo1tJaKA9RSn6OCcaUJ1fB-geb_gahnqKCCAZG01leozPkDJk0ocFmg27VJ6djG3oze4Z5Yn0ecYh4zDsMBu5jbQ9z_xvHL42T67mg6j_eD8yn23TW6DCZlX558gd4e7l9Xa7J5fpxGb0hkDEbiGhmYN43ltLbacyUUFSE4xpR22jrurBUgDQ_ciIloW9kKp4BbYG3QwBdo-dcbvfe7z2mSOXzvTlfwH3XUUHY |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICASSP.2012.6289043 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISBN | 9781467300469 1467300446 9781467300445 1467300462 |
EISSN | 2379-190X |
EndPage | 5004 |
ExternalDocumentID | 6289043 |
Genre | orig-research |
GroupedDBID | 23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI JC5 M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i220t-d86f2ea8b314b9e375715ffd2279d9bd3dbb506a3f3a5220cc6c5d703b02cf903 |
IEDL.DBID | RIE |
ISBN | 1467300454 9781467300452 |
ISSN | 1520-6149 |
IngestDate | Wed Jun 26 19:24:16 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i220t-d86f2ea8b314b9e375715ffd2279d9bd3dbb506a3f3a5220cc6c5d703b02cf903 |
OpenAccessLink | https://www.pure.ed.ac.uk/ws/files/18890623/Sagae_Lehr_ET_AL_2012_Hallucinated_n_best_lists_for_discriminative_language_modeling.pdf |
PageCount | 4 |
ParticipantIDs | ieee_primary_6289043 |
PublicationCentury | 2000 |
PublicationDate | 2012-March |
PublicationDateYYYYMMDD | 2012-03-01 |
PublicationDate_xml | – month: 03 year: 2012 text: 2012-March |
PublicationDecade | 2010 |
PublicationTitle | 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
PublicationTitleAbbrev | ICASSP |
PublicationYear | 2012 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000781036 ssj0008748 |
Score | 2.0008724 |
Snippet | This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 5001 |
SubjectTerms | automatic speech recognition Data models discriminative training Hidden Markov models language modeling semi-supervised methods Speech Speech recognition Training Training data Transducers |
Title | Hallucinated n-best lists for discriminative language modeling |
URI | https://ieeexplore.ieee.org/document/6289043 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV05T8MwFLbaTrBwtIhbHhhx6vpKvCAhRFWQiiqVSt0qn1JFlSJIFn49dpKWQwxscYY4sZ33np_f930AXJHUSa-YQpoNDGLEEKRSmSEuraZMKa9UzEOOn8Roxh7nfN4C11ssjHOuKj5zSbyszvLt2pQxVdYX8VSM0TZoZ5jUWK1tPiWS1uCo3d1Y4SytlLOCe4rbIyYrUJeI7OyMsw3XU9MmDR3RAMv-w93tdDqJNV8kafr7IbxS-Z3hHhhv3rguN3lJykIn5uMXmeN_P2kf9L4QfnCy9V0HoOXyQ7D7jZywC25GarUqzTIP0aiFOdLBf8BVWBXvMAS6MMJ5a0mwaDDhJu8JK2md8IAemA3vn-9GqFFbQEtCcIFsJjxxKtN0wLR0NOXpgHtvI8WgldpSqzXHQlFPVQjasDHCcBsMhsbEeInpEejk69wdAyi5Fy786iH4M0yTLFNx3ye0El4ziv0J6MaRWLzWhBqLZhBO_759BnbibNSFX-egU7yV7iJEAoW-rJbAJyEBq5g |
link.rule.ids | 310,311,786,790,795,796,802,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKGYCFR4t444GRtK7tOPGChCpQCm1Vqa3UrfJTqqhSBMnCr8dO0vIQA1ucwU4c5-58vu_7ALjBkeFWUBFI2lEBxQoHIuJxEHItCRXCCuHzkIMhS6b0aRbOauB2g4UxxhTFZ6blL4uzfL1SuU-VtZk_FaNkC2w7P494idbaZFQ8bQ3y6t2VHY6jQjvLOSi_QaK8gHUxz89OQ7pme6rauCIkcr22e9378Xjkq75wqxrxh_RK4Xke98Fg_cxlwclLK89kS338onP870sdgOYXxg-ONt7rENRMegT2vtETNsBdIpbLXC1SF49qmAbSeRC4dOviHbpQF3pAbykK5k0mXGc-YSGu4zpogunjw6SbBJXeQrDAGGWBjpnFRsSSdKjkhkRh1Amt1Z5kUHOpiZYyREwQS4QL25BSTIXamQyJsLIckWNQT1epOQGQh5YZ97O78E9RieNY-J0fk4JZSQmyp6DhZ2L-WlJqzKtJOPv79jXYSSaD_rzfGz6fg13_ZcoysAtQz95yc-nigkxeFcvhE2mHru4 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Hallucinated+n-best+lists+for+discriminative+language+modeling&rft.au=Sagae%2C+K.&rft.au=Lehr%2C+M.&rft.au=Prud%27hommeaux%2C+E.&rft.au=Xu%2C+P.&rft.date=2012-03-01&rft.pub=IEEE&rft.isbn=9781467300452&rft.issn=1520-6149&rft.eissn=2379-190X&rft.spage=5001&rft.epage=5004&rft_id=info:doi/10.1109%2FICASSP.2012.6289043&rft.externalDocID=6289043 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon |