Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation
Recommender systems are ubiquitous yet often difficult for users to control, and adjust if recommendation quality is poor. This has motivated conversational recommender systems (CRSs), with control provided through natural language feedback. However, as with most application domains, building robust...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
18.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Recommender systems are ubiquitous yet often difficult for users to control, and adjust if recommendation quality is poor. This has motivated conversational recommender systems (CRSs), with control provided through natural language feedback. However, as with most application domains, building robust CRSs requires training data that reflects system usage\(\unicode{x2014}\)here conversations with user utterances paired with items that cover a wide range of preferences. This has proved challenging to collect scalably using conventional methods. We address the question of whether it can be generated synthetically, building on recent advances in natural language. We evaluate in the setting of item set recommendation, noting the increasing attention to this task motivated by use cases like music, news, and recipe recommendation. We present TalkTheWalk, which synthesizes realistic high-quality conversational data by leveraging domain expertise encoded in widely available curated item collections, generating a sequence of hypothetical yet plausible item sets, then using a language model to produce corresponding user utterances. We generate over one million diverse playlist curation conversations in the music domain, and show these contain consistent utterances with relevant item sets nearly matching the quality of an existing but small human-collected dataset for this task. We demonstrate the utility of the generated synthetic dataset on a conversational item retrieval task and show that it improves over both unsupervised baselines and systems trained on a real dataset. |
---|---|
AbstractList | Recommender systems are ubiquitous yet often difficult for users to control, and adjust if recommendation quality is poor. This has motivated conversational recommender systems (CRSs), with control provided through natural language feedback. However, as with most application domains, building robust CRSs requires training data that reflects system usage\(\unicode{x2014}\)here conversations with user utterances paired with items that cover a wide range of preferences. This has proved challenging to collect scalably using conventional methods. We address the question of whether it can be generated synthetically, building on recent advances in natural language. We evaluate in the setting of item set recommendation, noting the increasing attention to this task motivated by use cases like music, news, and recipe recommendation. We present TalkTheWalk, which synthesizes realistic high-quality conversational data by leveraging domain expertise encoded in widely available curated item collections, generating a sequence of hypothetical yet plausible item sets, then using a language model to produce corresponding user utterances. We generate over one million diverse playlist curation conversations in the music domain, and show these contain consistent utterances with relevant item sets nearly matching the quality of an existing but small human-collected dataset for this task. We demonstrate the utility of the generated synthetic dataset on a conversational item retrieval task and show that it improves over both unsupervised baselines and systems trained on a real dataset. |
Author | Arun Tejasvi Chaganty Ganti, Ravi Balog, Krisztian Radlinski, Filip Pereira, Fernando Leszczynski, Megan Zhang, Shu |
Author_xml | – sequence: 1 givenname: Megan surname: Leszczynski fullname: Leszczynski, Megan – sequence: 2 givenname: Shu surname: Zhang fullname: Zhang, Shu – sequence: 3 givenname: Ravi surname: Ganti fullname: Ganti, Ravi – sequence: 4 givenname: Krisztian surname: Balog fullname: Balog, Krisztian – sequence: 5 givenname: Filip surname: Radlinski fullname: Radlinski, Filip – sequence: 6 givenname: Fernando surname: Pereira fullname: Pereira, Fernando – sequence: 7 fullname: Arun Tejasvi Chaganty |
BookMark | eNqNi8sKwjAURIMoWLX_cMF1ISZ94ba-NoJowWUJ9RZb20STVPDvDcUPcDUzhzMzMpZK4oh4jPNVkIaMTYlvTEMpZXHCooh75JSL9gH2jnB1ZQ2Xj3TD1iVshBWwR4la2FpJqJSGTMk3ajMA0cKxN048Y6m6DuVtwAsyqURr0P_lnCx32zw7BE-tXj0aWzSq1-5tCpYkNF1FMQv5f9YXoJhAow |
ContentType | Paper |
Copyright | 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_27708156243 |
IEDL.DBID | 8FG |
IngestDate | Thu Oct 10 16:40:01 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_27708156243 |
OpenAccessLink | https://www.proquest.com/docview/2770815624?pq-origsite=%requestingapplication% |
PQID | 2770815624 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2770815624 |
PublicationCentury | 2000 |
PublicationDate | 20231118 |
PublicationDateYYYYMMDD | 2023-11-18 |
PublicationDate_xml | – month: 11 year: 2023 text: 20231118 day: 18 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2023 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.4998133 |
SecondaryResourceType | preprint |
Snippet | Recommender systems are ubiquitous yet often difficult for users to control, and adjust if recommendation quality is poor. This has motivated conversational... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Random walk Recommender systems Slates Synthetic data Training |
Title | Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation |
URI | https://www.proquest.com/docview/2770815624 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8MwED90RfDNT_yYI6CvxbbJmnQvgrN1CI6iE_c2kix90X3Y1gdf_Nt3qZk-CHsJORJCcjkul8vlfgBXQSG55ibxowQLFivmS2GobxLTZYUyMVI22mIYD17Yw7g7dg63yoVVrnVio6inC2195NcR54HNbBKxm-WHb1Gj7Ouqg9DYBi_EDlaqRXb_62OJYo4WM_2nZpuzI9sDL5dLU-7DlpkfwE4TcqmrQ8hH8v2NoP1FXrHSI89fcyRwH8mdrCX5yQdt2UbQriR9Gx1eVs51Rxp4ZmLvjrOZcbhIR3CZpaP-wF9PY-IEpZr8LYseQwtv_OYECI80jdVUCEkFK3QoVCipCnRRBEYpyk6hvWmks83N57BrMdPth7pQtKFVl5_mAk_WWnUa9nXAu02H-RNSj9_pCm7shT0 |
link.rule.ids | 783,787,12779,21402,33387,33758,43614,43819 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NT8MgFH_RLUZvfkbdVBK9NraFFerFw7RWncsSa9ytAUYvujnbevC_36MyPZjsQiAkBB7kffHe-wFc-IXkmpvYC2NsWKSYJ4WhnolNjxXKRDiy0RbDKH1hD-Pe2DncKhdWueSJDaOefGjrI78MOfdtZZOQXc8_PYsaZX9XHYTGOrQZRUFjM8WTu18fSxhx1JjpPzbbyI5kG9ojOTflDqyZ2S5sNCGXutqDUSbf3wjqX-QVO1fk-XuGA7xHciNrSX7qQVuyEdQrSd9Gh5eVc92RBp6ZWNtxOjUOF2kfzpPbrJ96y23k7qFU-d-x6AG00OI3h0B4qGmkJkJIKlihA6ECSZWvi8I3SlF2BN1VKx2vnj6DzTR7GuSD--FjB7YsfrpNrgtEF1p1-WVOUMrW6rQh5QJvcoVU |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Talk+the+Walk%3A+Synthetic+Data+Generation+for+Conversational+Music+Recommendation&rft.jtitle=arXiv.org&rft.au=Leszczynski%2C+Megan&rft.au=Zhang%2C+Shu&rft.au=Ganti%2C+Ravi&rft.au=Balog%2C+Krisztian&rft.date=2023-11-18&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |