Semantically Aligned Question and Code Generation for Automated Insight Generation
Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or a...
Saved in:
Published in | 2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) pp. 127 - 134 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
ACM
20.04.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.1145/3643795.3648381 |
Cover
Abstract | Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions. |
---|---|
AbstractList | Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions. |
Author | Henley, Austin Z. Singh, Mukul Le, Vu Parnin, Chris Singha, Ananya Chopra, Bhavya Khatry, Anirudh Verbruggen, Gust Gulwani, Sumit |
Author_xml | – sequence: 1 givenname: Ananya surname: Singha fullname: Singha, Ananya email: t-asingha@microsoft.com organization: Microsoft,India – sequence: 2 givenname: Bhavya surname: Chopra fullname: Chopra, Bhavya email: t-bhchopra@microsoft.com organization: Microsoft,India – sequence: 3 givenname: Anirudh surname: Khatry fullname: Khatry, Anirudh email: t-anikhatry@microsoft.com organization: Microsoft,India – sequence: 4 givenname: Sumit surname: Gulwani fullname: Gulwani, Sumit email: sumitg@microsoft.com organization: Microsoft,USA – sequence: 5 givenname: Austin Z. surname: Henley fullname: Henley, Austin Z. email: azh321@gmail.com organization: Microsoft,USA – sequence: 6 givenname: Vu surname: Le fullname: Le, Vu email: levu@microsoft.com organization: Microsoft,USA – sequence: 7 givenname: Chris surname: Parnin fullname: Parnin, Chris email: chrisparnin@microsoft.com organization: Microsoft,USA – sequence: 8 givenname: Mukul surname: Singh fullname: Singh, Mukul email: singhmukul@microsoft.com organization: Microsoft,India – sequence: 9 givenname: Gust surname: Verbruggen fullname: Verbruggen, Gust email: gverbruggen@microsoft.com organization: Microsoft,Belgium |
BookMark | eNpNjM1Kw0AURkdQUGvWblzMC6TO5M7vMgSthYKo3Zeb5KYOJBNJpou-vUFduDqHj8N3yy7jGImxeynWUir9CEaB9Xq90IGTFyzz1jslhBXaerhm2TyHevFCGwHFDXv_oAFjCg32_ZmXfThGavnbieYUxsgxtrwaW-IbijThz9aNEy9PaRwwLek2zuH4mf4Fd-yqw36m7I8rtn9-2lcv-e51s63KXY6FcikvjCXTWQtNhwKh8GSMw9pih651jQdhjZKoi0brtkFjrJQShBPgfF3XsGIPv7eBiA5fUxhwOh-ksKCMNPANFn1PwA |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1145/3643795.3648381 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798400705793 |
EndPage | 134 |
ExternalDocumentID | 10734616 |
Genre | orig-research |
GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL |
ID | FETCH-LOGICAL-a248t-267e6f773cfa0a329e668ab7afa8d8c9307641a52c55dca6671113080389bbb3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 03:01:19 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a248t-267e6f773cfa0a329e668ab7afa8d8c9307641a52c55dca6671113080389bbb3 |
PageCount | 8 |
ParticipantIDs | ieee_primary_10734616 |
PublicationCentury | 2000 |
PublicationDate | 2024-April-20 |
PublicationDateYYYYMMDD | 2024-04-20 |
PublicationDate_xml | – month: 04 year: 2024 text: 2024-April-20 day: 20 |
PublicationDecade | 2020 |
PublicationTitle | 2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) |
PublicationTitleAbbrev | LLM4CODE |
PublicationYear | 2024 |
Publisher | ACM |
Publisher_xml | – name: ACM |
SSID | ssib057256032 |
Score | 1.8693117 |
Snippet | Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 127 |
SubjectTerms | alignment Code-generation Codes Conferences Costs Filtering Large language models LLM Semantics |
Title | Semantically Aligned Question and Code Generation for Automated Insight Generation |
URI | https://ieeexplore.ieee.org/document/10734616 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7akycVK77JwWvW3Tw3x1IsVbCIVuit5Cli3RXdPeivN8m2WgTBU0IIJEwe82Um8w0A54QXlgrDkPaFRNR7iaTIGbK80F5JZfJkcLuZ8PEDvZ6x2TJYPcXCOOfS5zOXxWry5dvatNFUFk64IJQXfBNshn3WBWutNg8TUXkTvKTvKSi7INEpJVkWypJE4uq1_ClJfYy2wWQ1cPdr5DlrG52Zz1-cjP-e2Q7o_0TqwdtvHbQLNly1B-7u3UsQWBT_4gMOFk-P4S6FybQZVgGqysJhbR3sKKdTW4CucNA2dcCvoetV9R7f7Gsd-mA6upwOx2iZOwEpTMsGYS4c90IQ41WuCJaO81JpobwqbWlkONqcFophw5g1inMRc84H-BgAjNaa7INeVVfuAMCiFNhjaR2TjCpqJHYMW6pdABqeM30I-lEe89eOHWO-EsXRH-3HYAsHYBA9Mjg_Ab3mrXWnQbE3-iwt6BdC6KMi |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF60HvSkYsW3e_CamOwzeyzF0mpbRCv0VvYpYk1Ek4P-eneTVosgeEpYFhJmdvN9mdn5BoALzFJDuKaRcqmIiHMiEjyhkWGpclJIndQBt9GY9R_I9ZROF8XqdS2MtbY-fGbjcFvn8k2hqxAq8zucY8JStg42PPAT2pRrLZcP5QG-MVoI-KSEXuKQlhI09tcMB-nqlQ4qNYD0tsF4-ejm3MhzXJUq1p-_VBn__W47oP1Tqwdvv1FoF6zZfA_c3dsXb7LggPkH7MyfHv3XFNbBTe8HKHMDu4WxsBGdrsc8eYWdqiw8g_VTB_l7-GtfmdAGk97VpNuPFt0TIolIVkaIccsc51g7mUiMhGUsk4pLJzOTaeE3NyOppEhTarRkjIeu855AegqjlML7oJUXuT0AMM04ckgYSwUlkmiBLEWGKOuphmNUHYJ2sMfstdHHmC1NcfTH-DnY7E9Gw9lwML45BlvI04SQn0HJCWiVb5U99TBfqrPauV_-5KZv |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Large+Language+Models+for+Code+%28LLM4Code%29&rft.atitle=Semantically+Aligned+Question+and+Code+Generation+for+Automated+Insight+Generation&rft.au=Singha%2C+Ananya&rft.au=Chopra%2C+Bhavya&rft.au=Khatry%2C+Anirudh&rft.au=Gulwani%2C+Sumit&rft.date=2024-04-20&rft.pub=ACM&rft.spage=127&rft.epage=134&rft_id=info:doi/10.1145%2F3643795.3648381&rft.externalDocID=10734616 |