Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

Authors: Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu
Conference: NeurIPS 2024
GitHub: https://github.com/ncclab-sustech/EEG_Image_decode

Abstract

This paper presents a novel EEG-based visual decoding framework that enables zero-shot image classification, retrieval, and reconstruction from EEG signals. It introduces the Adaptive Thinking Mapper (ATM) encoder to align EEG signals with image embeddings from CLIP, and a two-stage diffusion-based image generation pipeline that reconstructs high-quality images. The method achieves state-of-the-art performance, demonstrating EEG’s potential in brain-computer interfaces.

Introduction

Decoding human visual perception from brain activity is crucial for neuroscience and BCIs. While fMRI offers high spatial resolution, it is expensive and has poor temporal resolution. EEG provides high temporal resolution and portability but suffers from low spatial resolution and noise. This work bridges the gap by developing a new EEG encoder and a two-stage image reconstruction method, enabling EEG-based zero-shot visual decoding with competitive accuracy.

Method

EEG Encoder: Adaptive Thinking Mapper (ATM)

Combines channel-wise attention, temporal-spatial convolutions, and multilayer perceptron.
Processes raw EEG sequences as patches with positional embeddings.
Projects EEG data into a shared embedding space aligned with CLIP image features.

Contrastive Learning

Trains EEG encoder to align EEG embeddings with CLIP image embeddings using paired data.

Two-Stage Image Generation

Stage 1: EEG embeddings are input to a lightweight diffusion model to produce CLIP image priors and blurry images representing low-level features.
Stage 2: A pretrained SDXL diffusion model with IP-Adapter refines the image priors, guided by optional captions generated from EEG latent features, producing realistic reconstructions.

Loss Function

Combines CLIP contrastive loss and Mean Squared Error (MSE) for training.

Experiments

Dataset: THINGS-EEG (large-scale EEG dataset with 64 channels, downsampled and preprocessed).
Evaluated on zero-shot classification, retrieval, and reconstruction tasks.
Compared ATM encoder to various baselines, showing superior performance.
Ablation studies revealed occipital and parietal brain regions are most informative.
EEG decoding effective within 200-400 ms time window post-stimulus.
Demonstrated generalization to MEG data.

Results

Achieved near fMRI-level image reconstruction quality.
Two-stage diffusion-based image generation significantly improved reconstruction accuracy over one-stage methods.
EEG-based decoding showed strong performance despite EEG’s low SNR and spatial resolution.

Discussion and Conclusion

This study proves EEG’s feasibility for zero-shot visual decoding and image reconstruction with a novel EEG encoder and guided diffusion model. It opens avenues for portable, real-time BCIs and suggests further improvements through cross-subject learning, source localization, and multi-dataset integration.

References

The paper cites foundational works on brain decoding, contrastive learning, diffusion models, and EEG deep learning.

Code and Resources

Code repository: https://github.com/ncclab-sustech/EEG_Image_decode