Multimodal image synthesis based on disentanglement representations of anatomical and modality specific features, learned using uncooperative relativistic GAN

•The multimodal-image synthesis ensemble incorporates stochastic binary anatomical encoders with the derivative of a clipped rectified linear unit used as a straight-trough estimator to ensure proper minimization of the objective function and sparsify gradients, which, in turn, could improve ill-con...

Full description

Saved in:
Bibliographic Details
Published inMedical image analysis Vol. 80; p. 102514
Main Authors Reaungamornrat, Sureerat, Sari, Hasan, Catana, Ciprian, Kamen, Ali
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.08.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•The multimodal-image synthesis ensemble incorporates stochastic binary anatomical encoders with the derivative of a clipped rectified linear unit used as a straight-trough estimator to ensure proper minimization of the objective function and sparsify gradients, which, in turn, could improve ill-conditioning and training stability. Unlike a deterministic encoder, the stochastic encoders offer regularization to inhibit overfitting and enable increased solution-space exploration.•The ensemble uses feature-wised linear modulation (FiLM) in its decoders and was trained in a relativistic GAN framework using multiscale relativistic average discriminators. The FiLM-based decoders hindered suboptimal co-adaptation between subnetworks and therefore mitigated false anatomical deformation observed in decoders whose inputs were constructed by the concatenation of anatomical features and modality attributes.•A novel modality-insensitive structural encoder defines a cross-modality structural consistency constraint encouraging preservation of input anatomy in synthetic images.•We show that cooperative learning could lead to false anatomical alteration. Besides network architectures, objective functions governed the co-adaptation of subnetworks, including determining which subnetworks learn to compensate for which subnetworks’ faults. We show that mode collapse regularization induced cooperative learning of intra-modality subnetworks, while the cross-modality structural consistency constraint enabled inter-modality co-adaption. The former led to false anatomical deformation in synthetic CT, while the latter promoted preservation of input anatomy in synthetic CT. Growing number of methods for attenuation-coefficient map estimation from magnetic resonance (MR) images have recently been proposed because of the increasing interest in MR-guided radiotherapy and the introduction of positron emission tomography (PET) MR hybrid systems. We propose a deep-network ensemble incorporating stochastic-binary-anatomical encoders and imaging-modality variational autoencoders, to disentangle image-latent spaces into a space of modality-invariant anatomical features and spaces of modality attributes. The ensemble integrates modality-modulated decoders to normalize features and image intensities based on imaging modality. Besides promoting disentanglement, the architecture fosters uncooperative learning, offering ability to maintain anatomical structure in a cross-modality reconstruction. Introduction of a modality-invariant structural consistency constraint further enforces faithful embedding of anatomy. To improve training stability and fidelity of synthesized modalities, the ensemble is trained in a relativistic generative adversarial framework incorporating multiscale discriminators. Analyses of priors and network architectures as well as performance validation were performed on computed tomography (CT) and MR pelvis datasets. The proposed method demonstrated robustness against intensity inhomogeneity, improved tissue-class differentiation, and offered synthetic CT in Hounsfield units with intensities consistent and smooth across slices compared to the state-of-the-art approaches, offering median normalized mutual information of 1.28, normalized cross correlation of 0.97, and gradient cross correlation of 0.59 over 324 images. [Display omitted]
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1361-8415
1361-8423
DOI:10.1016/j.media.2022.102514