- The paper introduces a novel fMRI-to-image pipeline using tailored data protocols for both weak (memory-based) and strong (pure) visual imagination reconstruction.
- It adapts a state-of-the-art model with a curated dataset of 1200 surrealist images, achieving 91% accuracy for weak imagination and 88% for strong imagination classification.
- The study reveals distinct neural activation patterns between visual perception and imagination, underscoring the need for specialized datasets and ethical considerations in mind privacy.
This paper, "Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI" (2404.05468), addresses the challenging problem of reconstructing visual mental imagery from fMRI (functional Magnetic Resonance Imaging) data. While significant progress has been made in reconstructing images that subjects are seeing, decoding and visualizing what a person is imagining remains a major hurdle with potentially transformative applications, such as aiding communication for individuals with disabilities or providing insights into cognitive processes.
The core challenge lies in the lack of established data collection protocols and suitable datasets specifically designed for visual imagination. Unlike visual perception, where a tangible image serves as the ground truth for brain activity, imagination is subjective and lacks an external reference. This paper proposes a novel data collection approach and leverages recent advancements in fMRI-to-image models to tackle this problem.
The proposed methodology, dubbed the Mind-to-Image pipeline, involves several key components:
- Novel Data Collection Protocols: Inspired by existing datasets like NSD, the authors developed protocols tailored for visual imagination. They distinguish between:
- Weak Imagination (Imagination from Memory): Subjects first view images for 3 seconds, followed by a brief rest. Then, the same images are flashed for 0.1 seconds to prompt subjects to recall and imagine them for 5 seconds. This protocol aims to capture brain activity associated with recalling specific visual memories.
- Strong Imagination (Pure Imagination): Subjects are given verbal prompts (e.g., "Imagine a portrait representing optimism") and asked to imagine entirely new visual content for 6 seconds. This protocol targets the generation of novel imagery without a direct visual reference.
Data for both protocols was collected using a Siemens 3T PrismaFit MRI scanner with specific parameters (2mm isotropic voxels, TR/TE=1300ms/27ms, MultiBand=4, etc.) over approximately 6 hours from a subject.
- Curated Image Dataset: Instead of using broad datasets like COCO, the authors created a specialized dataset of 1200 surrealist images (600 face portraits, 600 nature landscapes). This narrower focus simplifies the reconstruction task and aligns with the project's artistic theme. The images were sourced from real art and generated using models like Versatile Diffusion and Midjourney. This dataset serves as the ground truth for the weak imagination training phase.
- fMRI Data Pre-processing: Raw BOLD data undergoes standard pre-processing, including co-registration with an anatomical scan. A General Linear Model (GLM) with the Glover hemodynamic response function is used to extract beta values for each event (viewing/imagining). Crucially, the authors manually curated a custom brain mask based on GLM analysis, selecting regions beyond just primary visual areas that were identified as active during imagination tasks. These selected voxel betas are flattened and paired with the corresponding images (for weak imagination) or prompts (for strong imagination).
- Adapted fMRI-to-Image Model: The pipeline is based on the state-of-the-art MindEye model (2305.18274), which maps fMRI data to both high-level (CLIP embeddings) and low-level (VAE embeddings) image representations. The authors adapted the architecture, specifically the MLP layers, to handle the larger dimensionality of their custom brain mask without excessive parameter increase. The model uses contrastive and MSE losses during training. At inference, the projected embeddings and low-level reconstruction are fed into an UnCLIP model (specifically Versatile Diffusion Image Variations) to generate the final image.
- Training and Inference Strategy: The adapted fMRI-to-Image model is trained exclusively on the weak imagination data. This trained model's parameters are then frozen, and it is used for inference on the strong imagination data. This transfer learning approach allows the model, trained on reconstructing remembered images, to attempt reconstruction of purely imagined content, for which no ground truth image exists.
The results of the paper provide several insights:
- Brain Activity Differences: Analysis of fMRI activity reveals distinct patterns for visual perception, weak imagination, and strong imagination. Visual perception primarily activates visual areas in the occipital lobe. Weak imagination shows reduced activity in primary visual areas but increased activity in basal temporal associative visual areas (like the fusiform gyrus). Strong imagination involves prefrontal regions associated with generating and controlling mental imagery, with limited overlap in temporal lobe visual areas. This justifies the need for imagination-specific datasets and masks.
- Weak Imagination Reconstruction: The model successfully reconstructs images from weak imagination data. Quantitatively, it achieves 91% accuracy in classifying the reconstructed images into the correct category (portrait or landscape) using a fine-tuned ResNet50 classifier. While quantitative reconstruction metrics like PixCorr and SSIM are lower than those reported for visual perception reconstruction on larger datasets (like MindEye on NSD), the results are significantly better than chance and demonstrate the feasibility of reconstructing remembered images. Qualitative results show varying fidelity, with some reconstructions closely matching the original images and others diverging.
- Strong Imagination Reconstruction: Applying the model trained on weak imagination to strong imagination data yields promising results via transfer learning. The model achieves 88% accuracy in classifying the generated images into the instructed category (portrait or landscape related to an emotion prompt). While quantitative metrics for content reconstruction are impossible without ground truth images, qualitative analysis based on subject descriptions shows instances where recognisable elements from the imagined scene appear in the generated images, alongside more abstract results or errors.
The paper discusses ethical considerations, particularly concerning "mind privacy," acknowledging the potential benefits and risks of technologies capable of decoding mental imagery. Future work includes collecting larger datasets, refining protocols for both weak and strong imagination, improving evaluation methods for strong imagination (which is inherently subjective), and exploring the potential use of more accessible technologies like EEG alongside or instead of fMRI.
In conclusion, this research represents a significant step towards reconstructing visual mental imagination from brain activity. By introducing specific data collection protocols for weak and strong imagination, creating a curated dataset, and adapting a state-of-the-art fMRI-to-image model, the authors demonstrate the ability to decode the category of imagined content and achieve partial reconstruction fidelity, particularly leveraging transfer learning from memory-based imagination to pure imagination. The work highlights the distinct neural basis of imagination compared to perception and lays foundational work for future advancements in this challenging field.