Evaluating Natural Language Inference Models: A Metamorphic Testing Approach

Natural language inference (NLI) is a fundamental NLP task that forms the cornerstone of deep natural language understanding. Unfortunately, evaluation of NLI models is challenging. On one hand, due to the lack of test oracles, it is difficult to automatically judge the correctness of NLI's pre...

Full description

Saved in:
Bibliographic Details
Published inProceedings - International Symposium on Software Reliability Engineering pp. 220 - 230
Main Authors Jiang, Mingyue, Bao, Houzhen, Tu, Kaiyi, Zhang, Xiao-Yi, Ding, Zuohua
Format Conference Proceeding
LanguageEnglish
Japanese
Published IEEE 01.10.2021
Subjects
Online AccessGet full text
ISSN2332-6549
DOI10.1109/ISSRE52982.2021.00033

Cover

More Information
Summary:Natural language inference (NLI) is a fundamental NLP task that forms the cornerstone of deep natural language understanding. Unfortunately, evaluation of NLI models is challenging. On one hand, due to the lack of test oracles, it is difficult to automatically judge the correctness of NLI's prediction results. On the other hand, apart from knowing how well a model performs, there is a further need for understanding the capabilities and characteristics of different NLI models. To mitigate these issues, we propose to apply the technique of metamorphic testing (MT) to NLI. We identify six categories of metamorphic relations, covering a wide range of properties that are expected to be possessed by NLI task. Based on this, MT can be conducted on NLI models without using test oracles, and MT results are able to interpret NLI models' capabilities from varying aspects. We further demonstrate the validity and effectiveness of our approach by conducting experiments on five NLI models. Our experiments expose a large number of prediction failures from subject NLI models, and also yield interpretations for common characteristics of NLI models.
ISSN:2332-6549
DOI:10.1109/ISSRE52982.2021.00033