ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis

Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of c...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on medical imaging Vol. 41; no. 10; pp. 2598 - 2614
Main Authors	Dalmaz, Onat, Yurt, Mahmut, Cukur, Tolga
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	adversarial Artificial neural networks Biomedical imaging Compression Computed tomography Computer applications Computer architecture Configuration management generative Image contrast Image synthesis Learning Magnetic resonance imaging Medical image synthesis Medical imaging Modules Neural networks residual Subspace constraints Synthesis Task analysis transformer Transformers unified Vision
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning. ResViT's generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0278-0062 1558-254X
DOI:	10.1109/TMI.2022.3167808