MindLDM: Reconstruct Visual Stimuli from fMRI Using Latent Diffusion Model

Deciphering brain activity evoked by visual stimuli has consistently been a popular pursuit in cognitive neuroscience. Due to the elusive foundations of visual formation, research on reconstructing visual stimuli encounters challenges. With the advancement of deep learning, several studies have succ...

Full description

Saved in:
Bibliographic Details
Published inIEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (Online) pp. 1 - 6
Main Authors Guo, Junhao, Yi, Chanlin, Li, Fali, Xu, Peng, Tian, Yin
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deciphering brain activity evoked by visual stimuli has consistently been a popular pursuit in cognitive neuroscience. Due to the elusive foundations of visual formation, research on reconstructing visual stimuli encounters challenges. With the advancement of deep learning, several studies have successfully reconstructed scenes resembling visual stimuli from functional magnetic resonance imaging (fMRI). However, substantial dissimilarities persist in terms of contour representation. Furthermore, the majority of existing research primarily focuses on within-subject decoding. In this study, we propose a novel approach - MindLDM that permits cross-subject vision reconstruction. It first employs a Masked Autoencoder (MAE) to obtain the latent features of fMRI and align them into the Contrastive Language-Image Pre-Training (CLIP) text feature space. Then, the Very Deep Variational Auto-Encoders (VDVAE) is utilized to get the contour information of the visual input. Finally, a latent diffusion model combined with ControlNet is proposed to reconstruct the visual stimuli. The MindLDM successfully achieves image reconstruction on the publicly available Natural Scenes Dataset, generating images that exhibit a high degree of semantic correlation with the visual stimuli and demonstrate improved restoration of scene details. Quantitative and qualitative results demonstrate the effectiveness of the proposed method. An exhaustive ablation study was also conducted to analyze our framework.
ISSN:2377-9322
DOI:10.1109/CIVEMSA58715.2024.10586647