How Can We Know What Language Models Know?

Recent work has presented intriguing results examining the knowledge contained in language models (LMs) by having the LM fill in the blanks of prompts such as “ ”. These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as “ __ ” may result in more accurately...

Full description

Saved in:

Bibliographic Details
Published in	Transactions of the Association for Computational Linguistics Vol. 8; pp. 423 - 438
Main Authors	Jiang, Zhengbao, Xu, Frank F., Araki, Jun, Neubig, Graham
Format	Journal Article
Language	English
Published	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.01.2020 MIT Press Journals, The The MIT Press
Subjects	Accuracy Archives & records Computational linguistics Knowledge Language Language modeling Linguistics Lower bounds Natural language Probability Profession Queries
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent work has presented intriguing results examining the knowledge contained in language models (LMs) by having the LM fill in the blanks of prompts such as “ ”. These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as “ __ ” may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at .
Bibliography:	Volume, 2020 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2307-387X 2307-387X
DOI:	10.1162/tacl_a_00324