Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses t...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
20.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The lack of large and diverse training data on Computer-Aided Diagnosis (CAD)
in breast cancer detection has been one of the concerns that impedes the
adoption of the system. Recently, pre-training with large-scale image text
datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the
issue of robustness and data efficiency in computer vision (CV). This paper
proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of
screening mammogram-report pairs, addressing the challenges of dataset
diversity and size. Our experiments on two public datasets demonstrate strong
performance in classifying and localizing various mammographic attributes
crucial for breast cancer detection, showcasing data efficiency and robustness
similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature
attribution method, to provide spatial interpretation of representation with
sentence-level granularity within mammography reports. Code is available
publicly: \url{https://github.com/batmanlab/Mammo-CLIP}. |
---|---|
DOI: | 10.48550/arxiv.2405.12255 |