🧠 From Brain signals to Story: Decoding Continuous Language from fMRI Using Language Models

Understanding what someone is thinking just by looking at their brain activity might sound like science fiction. But in a groundbreaking study, researchers have shown it’s possible to reconstruct entire sentences from non-invasive fMRI recordings. This blog explains how they did it, step by step, with clear examples and visuals.

📄 Citation: Tang et al., 2022 – Reconstructing language from non-invasive brain recordings

🔍 1. Why is Decoding Continuous Language from fMRI So Challenging?

Imagine someone listening to the sentence:

“The cat jumped over the fence.”

With fMRI, the brain’s response to this sentence isn’t mapped word-by-word. Why?

fMRI is slow: it measures the blood-oxygen-level-dependent (BOLD) signal, which is very slow. A single burst of neural activity causes a BOLD response that unfolds over ~10 seconds.
Since natural English speech occurs at ~2 words per second, this means that each brain image (fMRI scan) can be influenced by 20 or more overlapping words.
This temporal smearing makes it extremely difficult to isolate the brain’s response to individual words—posing a major challenge for sentence-level decoding.

So the challenge becomes:

Given a low SNR of brain activity, can we guess the exact sentence the person heard or imagined?

This is an ill-posed inverse problem with many possible solutions.

📊 2. Overview of the Brain Decoding Pipeline

The authors built a three-part system:

Encoding Model: Learns how word meanings (stimulus representations) map to brain activity
Word Rate Model: Takes brain activity and predicts the number of words per TR (word rate)
Language Decoder: Uses methods like beam search, nucleus sampling, and a Bayesian decoder to reconstruct word sequences using a language model and the encoding model

🧮 Bayesian Formulation of Decoding

The decoding framework is grounded in Bayes’ Theorem:

\[P(S \mid B) \propto P(B \mid S) \cdot P(S)\]

Where:

( S ): candidate sentence
( B ): observed brain activity
( P(S) ): prior probability from a language model (LM)
( P(B \mid S) ): likelihood estimated from the encoding model (how well the sentence explains the observed brain response)

This allows combining top-down language expectations with bottom-up neural evidence.

🤖 3. Encoding Model: Mapping Language to Brain

🔹 Feature Extraction

Uses a GPT-style language model to extract 768-dimensional semantic embeddings
Example: For the phrase “The cat jumped over the fence”, extract embedding for “fence”

⬇️ Downsampling & Delays

fMRI samples every 2s; brain reacts slower than speech
Embeddings are resampled using a Lanczos filter
Delays at 2, 4, 6, 8 seconds applied
Final feature vector per word: 768 × 4 = 3,072 dimensions

📊 Predicting Brain Activity

For each voxel, fit a ridge regression to predict fMRI response from 3,072 features
Use top 10,000 best-predicting voxels
Encoding model weight matrix: [10,000 voxels × 3,072 features]

🔀 4. Word Rate Model: Estimating When Words Happen

🔹 Purpose

fMRI tells you “something happened” every 2 seconds
This model predicts how many words occurred per scan

💡 How It Works

Input: fMRI signals from language areas
Uses future timepoints (t+2s to t+8s) to account for delay
Predicts number of words (e.g., 2.4 words)
Uses rounding to schedule when to decode next words

📊 Model Size

For V voxels and 4 delays: input feature size = 4 × V
Weight matrix: [1 × 4V]

📈 Example

Predicted word count in 2s scan = 2
Words scheduled at: 0.8s and 1.6s in that window

🔄 5. Language Decoding Methods: Beam Search, Nucleus Sampling, and Bayesian Approach

🎓 Goal

Find the most likely sentence that explains observed brain data

The authors use a combination of decoding strategies:

Beam Search: Keeps top-k most likely word sequences at each time step
Nucleus Sampling: Selects from the top-p most probable words to introduce diversity
Bayesian Decoder: Combines prior from the language model and likelihood from brain encoding

🔢 Step-by-Step Decoding Example

Let’s walk through how the model decodes one word at a time using brain data.

🔹 Setup

word_times.shape = (1671,) → model will decode 1671 words
lanczos_mat.shape = (291, 1671) → maps 1671 predicted word times to 291 TRs
Loop runs once per word: for sample_index in range(1671)

✅ Step 1: Affected TRs

Each word affects multiple fMRI timepoints due to the slow BOLD signal:

# Step-by-step example of affected TRs calculation
start_index = 0
end_index = 0
lanczos_slice = lanczos_mat[:, start_index]   # shape: (291,)
nonzero_trs = np.where(lanczos_slice != 0)[0]  # e.g., [0, 1, 2]
start_tr = nonzero_trs[0] + min(config.STIM_DELAYS)  # e.g., +2
end_tr = nonzero_trs[-1] + max(config.STIM_DELAYS)   # e.g., +8
affected = np.arange(start_tr, end_tr + 1)           # Final TRs

Example output:

affected_trs(0, 0, lanczos_mat) → [1, 2, 3, 4, 5, 6]

✅ Step 2: Propose Initial Words

If the beam is empty:

INIT = ["She", "He", "They"]
Beam = ["She"]

✅ Step 3: Propose Next Words from LM

Context = “She”. Language model predicts: | Word | Prob | |——|——| | went | 0.6 | | had | 0.2 | | is | 0.1 |

Nucleus sampling with top-90% mass → nucleus = ["went", "had"]

✅ Step 4: Score Candidates

Score each extended sentence using the encoding model:

"She went" → likelihood = -10.2
"She had"  → likelihood = -13.5

Keep top hypothesis:

Beam = ["She went"]

🕐 Step 5: Repeat for Word 2

Context = “She went”

LM suggests: to, back, away
Extensions scored: "She went to", "She went back"
Encoding model ranks and updates beam

Repeat until all 1671 words are decoded.

📊 Evaluation Metrics

Perplexity (LM prior quality): lower is better
Encoding model R² score: how well the model predicts fMRI signals
Decoding accuracy: BLEU/METEOR scores for reconstructed vs. true sentences
Human evaluation: qualitative rating of perceived vs. generated story coherence

🧪 Perceived vs. Imagined Speech Examples

Type	Input Condition	Decoded Output
Perceived	Listened to “The prince smiled.”	“The prince was smiling.”
Imagined	Silently imagined “She walked away.”	“She left quietly.”

These show that even without actual speech, the decoder retrieves plausible paraphrases.

🔎 Did the Decoder Use Brain Data for Training?

Component	Trained on Brain Data?	Purpose
Encoding Model	✅ Yes	Predicts fMRI responses from word embeddings
Word Rate Model	✅ Yes	Predicts number of words per scan
Language Model (GPT)	❌ No	Pretrained and fine-tuned on stories only
Decoder (Beam/Nucleus)	❌ No	Uses logic to combine LM + encoding model

✨ Final Thoughts

This study is a major leap forward in building non-invasive brain-computer interfaces. It proves that with enough semantic grounding and smart modeling, AI can reconstruct thoughts from brainwaves.

Key takeaways:

Brain decoding is possible without surgery
Semantic embeddings capture high-level meaning well
Timing alignment (via word rate model) is crucial
Meaningful language can be decoded from imagined speech and silent movies
Multiple decoding strategies (beam, nucleus, Bayesian) make the system robust and flexible

📄 Original Study: Tang et al., 2022 – bioRxiv Preprint