Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL

Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs,...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Zhong, Ruiqi, Snell, Charlie, Klein, Dan, Eisner, Jason
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 23.10.2023
Subjects	Annotations Bayesian analysis Natural language Natural language (computers) Statistical inference Synthesis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs, we ask them to select indirectly by examining the programs' input-ouput examples. For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs. It then asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct and could be used to fine-tune the parser. As a first case study, we recruited human non-programmers to use APEL to re-annotate SPIDER, a text-to-SQL dataset. Our approach achieved the same annotation accuracy as the original expert annotators (75%) and exposed many subtle errors in the original annotations.
ISSN:	2331-8422