🧠 BrainLLM: Direct generative language reconstruction from brain activity

BrainLLM introduces a fully generative framework to reconstruct natural language from brain activity, particularly during continuous story listening. BrainLLM reconstructs natural language from brain recordings like fMRI using large language models (LLMs) conditioned on neural signals.

Instead of classifying among pre-generated candidates, BrainLLM:

✅ Directly injects neural activity into a large language model (LLM)
✅ Allows the LLM to freely generate words, conditioned on brain signals
✅ Supports open vocabulary generation (tens of thousands of tokens)

🧭 Visual Walkthrough

The diagram below illustrates the full BrainLLM pipeline for continous language generation:

BrainLLM Pipeline

🎯 BrainLLM Objective

The model learns to predict the perceived continuation of a story based on:

A text prompt (previous words)
Brain signals recorded while hearing the continuation

Formally:

m̂ᵢ = F({w₁, ..., wₙ, m̂₁, ..., m̂ᵢ₋₁}, B, Θ)

Where:

m̂ᵢ = generated token
B = brain signals
Θ = model parameters (brain adapter, special tokens)

Only the brain adapter and special tokens are trained—LLM weights remain frozen.

🧬 Text Embeddings and Brain Adapter

Text tokens transformed by LLM’s embedding layer to VW ∈ ℝⁿˣᵈ
Brain signals (fMRI) projected to same space using a non-linear brain adapter

Adapter process:

vBᵢ = MLP(bᵢ + pᵢ)

pᵢ = position embedding (captures temporal order)
MLP = multi-layer perceptron with ReLU activations
Final brain embedding dimension matches LLM embedding size

Unified Input:

[⟨brain⟩, vB₁, ..., vBₜ, ⟨/brain⟩, vW₁, ..., vWₙ]

🧪 Evaluation Settings

Two modes:

✅ Continuation Generation:

Predict next tokens with ground-truth prompt + brain signals

✅ Full-Text Reconstruction:

Only brain signals provided per window
Autoregressive generation using prior outputs as prompt

Performance evaluated using:

BLEU, ROUGE, METEOR, WER (machine metrics)
Human evaluations of fluency and similarity

🏗️ Pipeline Overview

Record brain activity during story listening
Process brain signals into LLM-friendly embeddings
Predict perceived word timing with a Word Rate Model
Autoregressively generate text using LLM + brain signals
Repeat for continuous full-text reconstruction

⚙️ Step-by-Step Example

Scenario:

Participant listens to:
“The desert stretched endlessly before him. The wind whispered softly across the dunes.”
Brain data:
- TR = 2 seconds
- 10,000 voxels per scan
- 3737 total TRs

Step 1: Brain Recordings

b₀,₁, b₀,₂, b₀,₃, b₀,₄ ∈ ℝ¹⁰⁰⁰⁰

Step 2: Brain Embedding

vB₀,₁, vB₀,₂, vB₀,₃, vB₀,₄ ∈ ℝᵈ

Step 3: Word Rate Prediction

l₀ = WR(b₀,₁, b₀,₂, b₀,₃, b₀,₄) = 3

Step 4: LLM Text Generation

[⟨brain⟩, vB₀,₁, vB₀,₂, vB₀,₃, vB₀,₄, ⟨/brain⟩]

Generates:

["The", "desert", "stretched"]

Step 5: Iterative Reconstruction

Next window:

l₁ = WR(b₁,₁, ..., b₁,₄) = 4

LLM input:

[⟨brain⟩, vB₁,₁, ..., vB₁,₄, ⟨/brain⟩, "The", "desert", "stretched"]

LLM generates:

["endlessly", "before", "him", "."]

Repeat for full text.

🧩 Pseudocode for BrainLLM Full-Text Decoding

for each time frame k:

    brain_signals = [bₖ,₁, bₖ,₂, bₖ,₃, bₖ,₄]
    lₖ = WordRateModel(brain_signals)
    brain_embs = [BrainAdapter(b) for b in brain_signals]

    input_seq = [<brain>] + brain_embs + [</brain>] + previous_tokens
    new_tokens = LLM.generate(input_seq, num_tokens=lₖ)
    previous_tokens.extend(new_tokens)

🎯 Why BrainLLM Works

✅ Aligns brain signals to LLM embedding space
✅ Uses future brain signals to handle BOLD delay
✅ Keeps LLM frozen, only trains adapter and special tokens
✅ Supports flexible, natural language reconstruction