Cross-Subject Reference Attention for Brain Lesion Segmentation
Brain lesion segmentation serves as a useful tool for clinical diagnosis and scientific research. The Transformer architecture has achieved remarkable performance in the field of computer vision and can potentially benefit brain lesion segmentation. Standard Transformer models partition an image int...
Saved in:
Published in | 2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 5 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
06.06.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/DLCV65218.2025.11088861 |
Cover
Loading…
Summary: | Brain lesion segmentation serves as a useful tool for clinical diagnosis and scientific research. The Transformer architecture has achieved remarkable performance in the field of computer vision and can potentially benefit brain lesion segmentation. Standard Transformer models partition an image into patches and compute the self-attention matrix between these patches to model long-range dependencies. However, as brain lesions usually occupy only a small portion of the image, self-attention is computed mostly between patches of lesions and healthy tissue. Consequently, there is a lack of long-range dependency that is relevant to brain lesion segmentation, and existing Transformer models have not achieved a clear improvement over convolution-based methods, such as nnU-Net. To address the limitation above, we propose a cross-subject reference attention (CRA) mechanism for brain lesion segmentation. CRA exploits the information from not only the image to segment but also reference lesion information from other images, which can provide relevant long-range dependency. Specifically, CRA consists of a subject-level tokenizer, a reference pool, and a hierarchical cross-attention module. The subject-level tokenizer first maps the input feature map of each subject into a fixed number of tokens, which compresses the rich image information to reduce the subsequent computation overhead. Then, the tokens of selected subjects are stored in the reference pool as reference tokens. Finally, in the hierarchical cross-attention image features are adjusted with adaptive guidance of reference tokens, which alleviates the lack of lesion patches in a single image. CRA is agnostic to the segmentation backbone and we integrate it with the state-of-the-art nnU-Net framework. To evaluate the proposed method, we performed experiments on multiple datasets, and the results indicate that CRA leads to improved accuracy of brain lesion segmentation. |
---|---|
DOI: | 10.1109/DLCV65218.2025.11088861 |