Improving Code Generation by Training with Natural Language Feedback
The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitat...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.03.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The potential for pre-trained large language models (LLMs) to use natural
language feedback at inference time has been an exciting recent development. We
build upon this observation by formalizing an algorithm for learning from
natural language feedback at training time instead, which we call Imitation
learning from Language Feedback (ILF). ILF requires only a small amount of
human-written feedback during training and does not require the same feedback
at test time, making it both user-friendly and sample-efficient. We further
show that ILF can be seen as a form of minimizing the KL divergence to the
ground truth distribution and demonstrate a proof-of-concept on a neural
program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's
pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python
Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and
fine-tuning on repaired programs written by humans. Overall, our results
suggest that learning from human-written natural language feedback is both more
effective and sample-efficient than training exclusively on demonstrations for
improving an LLM's performance on code generation tasks. |
---|---|
AbstractList | The potential for pre-trained large language models (LLMs) to use natural
language feedback at inference time has been an exciting recent development. We
build upon this observation by formalizing an algorithm for learning from
natural language feedback at training time instead, which we call Imitation
learning from Language Feedback (ILF). ILF requires only a small amount of
human-written feedback during training and does not require the same feedback
at test time, making it both user-friendly and sample-efficient. We further
show that ILF can be seen as a form of minimizing the KL divergence to the
ground truth distribution and demonstrate a proof-of-concept on a neural
program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's
pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python
Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and
fine-tuning on repaired programs written by humans. Overall, our results
suggest that learning from human-written natural language feedback is both more
effective and sample-efficient than training exclusively on demonstrations for
improving an LLM's performance on code generation tasks. |
Author | Perez, Ethan Bowman, Samuel R Chan, Jun Shern Cho, Kyunghyun Chen, Angelica Scheurer, Jérémy Campos, Jon Ander Korbak, Tomasz |
Author_xml | – sequence: 1 givenname: Angelica surname: Chen fullname: Chen, Angelica – sequence: 2 givenname: Jérémy surname: Scheurer fullname: Scheurer, Jérémy – sequence: 3 givenname: Tomasz surname: Korbak fullname: Korbak, Tomasz – sequence: 4 givenname: Jon Ander surname: Campos fullname: Campos, Jon Ander – sequence: 5 givenname: Jun Shern surname: Chan fullname: Chan, Jun Shern – sequence: 6 givenname: Samuel R surname: Bowman fullname: Bowman, Samuel R – sequence: 7 givenname: Kyunghyun surname: Cho fullname: Cho, Kyunghyun – sequence: 8 givenname: Ethan surname: Perez fullname: Perez, Ethan |
BackLink | https://doi.org/10.48550/arXiv.2303.16749$$DView paper in arXiv |
BookMark | eNotz7tOwzAYBWAPMEDpA3TCL5Bg9_clHlGgpVLULtmjP7EdLFqnMmmhbw-9TGc40tH5HsldHKIjZMZZLgop2Qum33DM58Ag50oL80DeVrt9Go4h9rQcrKNLF13CMQyRtidaJwzx3P2E8ZOucTwk3NIKY3_A3tGFc7bF7uuJ3HvcfrvpLSekXrzX5UdWbZar8rXKUGmTca0YL8Baxg1wYR20qISR0oKxAL5AL5FL7Jj-P2h8K7iVXgMYphR0Aibk-Tp7YTT7FHaYTs2Z01w48AeHNkW0 |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2303.16749 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2303_16749 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a679-1760183dd019314de3ba64955d39d33f8af5a15ac073039fb41d5f73390663c43 |
IEDL.DBID | GOX |
IngestDate | Tue Feb 27 12:26:39 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a679-1760183dd019314de3ba64955d39d33f8af5a15ac073039fb41d5f73390663c43 |
OpenAccessLink | https://arxiv.org/abs/2303.16749 |
ParticipantIDs | arxiv_primary_2303_16749 |
PublicationCentury | 2000 |
PublicationDate | 2023-03-28 |
PublicationDateYYYYMMDD | 2023-03-28 |
PublicationDate_xml | – month: 03 year: 2023 text: 2023-03-28 day: 28 |
PublicationDecade | 2020 |
PublicationYear | 2023 |
Score | 1.8718095 |
SecondaryResourceType | preprint |
Snippet | The potential for pre-trained large language models (LLMs) to use natural
language feedback at inference time has been an exciting recent development. We
build... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Computer Science - Software Engineering |
Title | Improving Code Generation by Training with Natural Language Feedback |
URI | https://arxiv.org/abs/2303.16749 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NTwMhEJ20PXkxGjX1Mxy8ou4OUDia6toYrZc12VsDCyTGRE2tRv-9DLt-XLwCFx4h82YY3gM4Nm3pEUvLdWKvXBRO8BTlCh6ME75UUShNn5Nv52p2L64b2QyAff-FscuPh_dOH9i9niZ-jCfUJ2-GMCxLatm6umu6x8ksxdWv_12XOGYe-hMkqg1Y79kdO--OYxMG4WkLLn4SdzZ99oF1Ss8ECHOfrO49GhhVRNncZh0MdtOXEVmVoouz7eM21NVlPZ3x3ryAWzUxpLt4lm6L94lCYSF8QGdVSkakR5PAidpGaQtpW7piaKIThZdxgmiIA7QCd2CU8v8wBuYTp5fRtTI7lythrSZ_IdRSOQxS7MI4b3nx0ulTLAiNRUZj7_-pfVgj53Rqpyr1AYxWy7dwmOLryh1lkL8Aw0t4WA |
link.rule.ids | 228,230,786,891 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+Code+Generation+by+Training+with+Natural+Language+Feedback&rft.au=Chen%2C+Angelica&rft.au=Scheurer%2C+J%C3%A9r%C3%A9my&rft.au=Korbak%2C+Tomasz&rft.au=Campos%2C+Jon+Ander&rft.date=2023-03-28&rft_id=info:doi/10.48550%2Farxiv.2303.16749&rft.externalDocID=2303_16749 |