MindLDM: Reconstruct Visual Stimuli from fMRI Using Latent Diffusion Model

Deciphering brain activity evoked by visual stimuli has consistently been a popular pursuit in cognitive neuroscience. Due to the elusive foundations of visual formation, research on reconstructing visual stimuli encounters challenges. With the advancement of deep learning, several studies have succ...

Full description

Saved in:

Bibliographic Details
Published in	IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (Online) pp. 1 - 6
Main Authors	Guo, Junhao, Yi, Chanlin, Li, Fali, Xu, Peng, Tian, Yin
Format	Conference Proceeding
Language	English
Published	IEEE 14.06.2024
Subjects	Cognitive neuroscience Deep learning Diffusion models fMRI Functional magnetic resonance imaging Index Terms: Cognitive neuroscience Latent Diffusion Model Reconstruct visual stimuli Semantics Virtual environments Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deciphering brain activity evoked by visual stimuli has consistently been a popular pursuit in cognitive neuroscience. Due to the elusive foundations of visual formation, research on reconstructing visual stimuli encounters challenges. With the advancement of deep learning, several studies have successfully reconstructed scenes resembling visual stimuli from functional magnetic resonance imaging (fMRI). However, substantial dissimilarities persist in terms of contour representation. Furthermore, the majority of existing research primarily focuses on within-subject decoding. In this study, we propose a novel approach - MindLDM that permits cross-subject vision reconstruction. It first employs a Masked Autoencoder (MAE) to obtain the latent features of fMRI and align them into the Contrastive Language-Image Pre-Training (CLIP) text feature space. Then, the Very Deep Variational Auto-Encoders (VDVAE) is utilized to get the contour information of the visual input. Finally, a latent diffusion model combined with ControlNet is proposed to reconstruct the visual stimuli. The MindLDM successfully achieves image reconstruction on the publicly available Natural Scenes Dataset, generating images that exhibit a high degree of semantic correlation with the visual stimuli and demonstrate improved restoration of scene details. Quantitative and qualitative results demonstrate the effectiveness of the proposed method. An exhaustive ablation study was also conducted to analyze our framework.
ISSN:	2377-9322
DOI:	10.1109/CIVEMSA58715.2024.10586647