Survival Analysis with Multiple Noisy Labels

In many applications, collecting ground truth labels is labor intensive and costly. Thus, researchers often turn to pragmatic labeling tools based on heuristics, at the potential cost of introducing noise. When multiple different labeling tools are used, we find ourselves in the setting of multiple...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE International Conference on Data Mining) pp. 863 - 868
Main Authors Tjandra, Donna, Wiens, Jenna
Format Conference Proceeding
LanguageEnglish
Published IEEE 09.12.2024
Subjects
Online AccessGet full text
ISSN2374-8486
DOI10.1109/ICDM59182.2024.00106

Cover

More Information
Summary:In many applications, collecting ground truth labels is labor intensive and costly. Thus, researchers often turn to pragmatic labeling tools based on heuristics, at the potential cost of introducing noise. When multiple different labeling tools are used, we find ourselves in the setting of multiple noisy labels. Previous work studying supervised learning with multiple noisy labels focuses on classification and proposes different strategies to aggregate labels. Here, we move beyond classification and study multiple noisy labels in the context of time-to-event prediction (i.e., survival analysis). As we show, survival analysis presents additional challenges when learning from multiple noisy labels since outcomes may be censored. We formalize the problem of multiple noisy labels in survival analysis and propose a novel approach. Our approach leverages a reference set with both noisy and ground truth labels to model the noisy time-to-event distribution and their associated errors and then uses these distributions to predict the ground truth time-to-event distribution. When predicting sepsis onset in the MIMIC-III dataset, our approach more accurately estimates time-to-events compared to the next best baseline (median time-to-event error across 10 replications: 14.5 hours [interquartile range 13.25-15.75] vs. 17.50 hours [interquartile range 16.25-18.00]). CODE
ISSN:2374-8486
DOI:10.1109/ICDM59182.2024.00106