DAGnosis: Localized Identification of Data Inconsistencies using Structures

Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. While recent data-centric methods are able to identify such inconsistencies with respect to the training set, they suffer from two key limitations: (1) suboptimal...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Huynh, Nicolas, Berrevoets, Jeroen, Seedat, Nabeel, Crabbé, Jonathan, Qian, Zhaozhi, van der Schaar, Mihaela
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 28.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. While recent data-centric methods are able to identify such inconsistencies with respect to the training set, they suffer from two key limitations: (1) suboptimality in settings where features exhibit statistical independencies, due to their usage of compressive representations and (2) lack of localization to pin-point why a sample might be flagged as inconsistent, which is important to guide future data collection. We solve these two fundamental limitations using directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure. Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions. DAGnosis unlocks the localization of the causes of inconsistencies on a DAG, an aspect overlooked by previous approaches. Moreover, we show empirically that leveraging these interactions (1) leads to more accurate conclusions in detecting inconsistencies, as well as (2) provides more detailed insights into why some samples are flagged.
ISSN:2331-8422