GROD: Enhancing Generalization of Transformer with Out-of-Distribution Detection
Transformer networks excel in natural language processing (NLP) and computer vision (CV) tasks. However, they face challenges in generalizing to Out-of-Distribution (OOD) datasets, that is, data whose distribution differs from that seen during training. The OOD detection aims to distinguish data tha...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
13.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Transformer networks excel in natural language processing (NLP) and computer
vision (CV) tasks. However, they face challenges in generalizing to
Out-of-Distribution (OOD) datasets, that is, data whose distribution differs
from that seen during training. The OOD detection aims to distinguish data that
deviates from the expected distribution, while maintaining optimal performance
on in-distribution (ID) data. This paper introduces a novel approach based on
OOD detection, termed the Generate Rounded OOD Data (GROD) algorithm, which
significantly bolsters the generalization performance of transformer networks
across various tasks. GROD is motivated by our new OOD detection Probably
Approximately Correct (PAC) Theory for transformer. The transformer has
learnability in terms of OOD detection that is, when the data is sufficient the
outlier can be well represented. By penalizing the misclassification of OOD
data within the loss function and generating synthetic outliers, GROD
guarantees learnability and refines the decision boundaries between inlier and
outlier. This strategy demonstrates robust adaptability and general
applicability across different data types. Evaluated across diverse OOD
detection tasks in NLP and CV, GROD achieves SOTA regardless of data format. On
average, it reduces the SOTA FPR@95 from 21.97% to 0.12%, and improves AUROC
from 93.62% to 99.98% on image classification tasks, and the SOTA FPR@95 by
12.89% and AUROC by 2.27% in detecting semantic text outliers. The code is
available at
https://anonymous.4open.science/r/GROD-OOD-Detection-with-transformers-B70F. |
---|---|
DOI: | 10.48550/arxiv.2406.12915 |