Benchmarking foundation models as feature extractors for weakly-supervised computational pathology
Advancements in artificial intelligence have driven the development of numerous pathology foundation models capable of extracting clinically relevant information. However, there is currently limited literature independently evaluating these foundation models on truly external cohorts and clinically-...
Saved in:
Main Authors | , , , , , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Advancements in artificial intelligence have driven the development of
numerous pathology foundation models capable of extracting clinically relevant
information. However, there is currently limited literature independently
evaluating these foundation models on truly external cohorts and
clinically-relevant tasks to uncover adjustments for future improvements. In
this study, we benchmarked ten histopathology foundation models on 13 patient
cohorts with 6,791 patients and 9,493 slides from lung, colorectal, gastric,
and breast cancers. The models were evaluated on weakly-supervised tasks
related to biomarkers, morphological properties, and prognostic outcomes. We
show that a vision-language foundation model, CONCH, yielded the highest
performance in 42% of tasks when compared to vision-only foundation models. The
experiments reveal that foundation models trained on distinct cohorts learn
complementary features to predict the same label, and can be fused to
outperform the current state of the art. Creating an ensemble of complementary
foundation models outperformed CONCH in 66% of tasks. Moreover, our findings
suggest that data diversity outweighs data volume for foundation models. Our
work highlights actionable adjustments to improve pathology foundation models. |
---|---|
DOI: | 10.48550/arxiv.2408.15823 |