Semantic reconstruction of continuous language from non-invasive brain recordings

🧠 From Brain signals to Story: Decoding Continuous Language from fMRI Using Language Models

Understanding what someone is thinking just by looking at their brain activity might sound like science fiction. But in a groundbreaking study, researchers have shown it’s possible to reconstruct entire sentences from non-invasive fMRI recordings. This blog explains how they did it, step by step, with clear examples and visuals.

📄 Citation: Tang et al., 2022 – Reconstructing language from non-invasive brain recordings


🔍 1. Why is Decoding Continuous Language from fMRI So Challenging?

Imagine someone listening to the sentence:

“The cat jumped over the fence.”

With fMRI, the brain’s response to this sentence isn’t mapped word-by-word. Why?

  • fMRI is slow: it measures the blood-oxygen-level-dependent (BOLD) signal, which is very slow. A single burst of neural activity causes a BOLD response that unfolds over ~10 seconds.

  • Since natural English speech occurs at ~2 words per second, this means that each brain image (fMRI scan) can be influenced by 20 or more overlapping words.

  • This temporal smearing makes it extremely difficult to isolate the brain’s response to individual words—posing a major challenge for sentence-level decoding.

So the challenge becomes:

Given a low SNR of brain activity, can we guess the exact sentence the person heard or imagined?

This is an ill-posed inverse problem with many possible solutions.


📊 2. Overview of the Brain Decoding Pipeline

The authors built a three-part system:

  1. Encoding Model: Learns how word meanings (stimulus representations) map to brain activity
  2. Word Rate Model: Takes brain activity and predicts the number of words per TR (word rate)
  3. Language Decoder: Uses methods like beam search, nucleus sampling, and a Bayesian decoder to reconstruct word sequences using a language model and the encoding model

🧮 Bayesian Formulation of Decoding

The decoding framework is grounded in Bayes’ Theorem:

\[P(S \mid B) \propto P(B \mid S) \cdot P(S)\]

Where:

  • ( S ): candidate sentence
  • ( B ): observed brain activity
  • ( P(S) ): prior probability from a language model (LM)
  • ( P(B \mid S) ): likelihood estimated from the encoding model (how well the sentence explains the observed brain response)

This allows combining top-down language expectations with bottom-up neural evidence.


🤖 3. Encoding Model: Mapping Language to Brain

🔹 Feature Extraction

  • Uses a GPT-style language model to extract 768-dimensional semantic embeddings
  • Example: For the phrase “The cat jumped over the fence”, extract embedding for “fence”

⬇️ Downsampling & Delays

  • fMRI samples every 2s; brain reacts slower than speech
  • Embeddings are resampled using a Lanczos filter
  • Delays at 2, 4, 6, 8 seconds applied
  • Final feature vector per word: 768 × 4 = 3,072 dimensions

📊 Predicting Brain Activity

  • For each voxel, fit a ridge regression to predict fMRI response from 3,072 features
  • Use top 10,000 best-predicting voxels
  • Encoding model weight matrix: [10,000 voxels × 3,072 features]

🔀 4. Word Rate Model: Estimating When Words Happen

🔹 Purpose

  • fMRI tells you “something happened” every 2 seconds
  • This model predicts how many words occurred per scan

💡 How It Works

  • Input: fMRI signals from language areas
  • Uses future timepoints (t+2s to t+8s) to account for delay
  • Predicts number of words (e.g., 2.4 words)
  • Uses rounding to schedule when to decode next words

📊 Model Size

  • For V voxels and 4 delays: input feature size = 4 × V
  • Weight matrix: [1 × 4V]

📈 Example

  • Predicted word count in 2s scan = 2
  • Words scheduled at: 0.8s and 1.6s in that window

🔄 5. Language Decoding Methods: Beam Search, Nucleus Sampling, and Bayesian Approach

🎓 Goal

Find the most likely sentence that explains observed brain data

The authors use a combination of decoding strategies:

  • Beam Search: Keeps top-k most likely word sequences at each time step
  • Nucleus Sampling: Selects from the top-p most probable words to introduce diversity
  • Bayesian Decoder: Combines prior from the language model and likelihood from brain encoding

🔢 Step-by-Step Decoding Example

Let’s walk through how the model decodes one word at a time using brain data.

🔹 Setup

  • word_times.shape = (1671,) → model will decode 1671 words
  • lanczos_mat.shape = (291, 1671) → maps 1671 predicted word times to 291 TRs
  • Loop runs once per word: for sample_index in range(1671)

✅ Step 1: Affected TRs

Each word affects multiple fMRI timepoints due to the slow BOLD signal:

# Step-by-step example of affected TRs calculation
start_index = 0
end_index = 0
lanczos_slice = lanczos_mat[:, start_index]   # shape: (291,)
nonzero_trs = np.where(lanczos_slice != 0)[0]  # e.g., [0, 1, 2]
start_tr = nonzero_trs[0] + min(config.STIM_DELAYS)  # e.g., +2
end_tr = nonzero_trs[-1] + max(config.STIM_DELAYS)   # e.g., +8
affected = np.arange(start_tr, end_tr + 1)           # Final TRs

Example output:

affected_trs(0, 0, lanczos_mat)  [1, 2, 3, 4, 5, 6]

✅ Step 2: Propose Initial Words

If the beam is empty:

INIT = ["She", "He", "They"]
Beam = ["She"]

✅ Step 3: Propose Next Words from LM

Context = “She”. Language model predicts: | Word | Prob | |——|——| | went | 0.6 | | had | 0.2 | | is | 0.1 |

Nucleus sampling with top-90% mass → nucleus = ["went", "had"]


✅ Step 4: Score Candidates

Score each extended sentence using the encoding model:

"She went"  likelihood = -10.2
"She had"   likelihood = -13.5

Keep top hypothesis:

Beam = ["She went"]

🕐 Step 5: Repeat for Word 2

Context = “She went”

  • LM suggests: to, back, away
  • Extensions scored: "She went to", "She went back"
  • Encoding model ranks and updates beam

Repeat until all 1671 words are decoded.


📊 Evaluation Metrics

  • Perplexity (LM prior quality): lower is better
  • Encoding model R² score: how well the model predicts fMRI signals
  • Decoding accuracy: BLEU/METEOR scores for reconstructed vs. true sentences
  • Human evaluation: qualitative rating of perceived vs. generated story coherence

🧪 Perceived vs. Imagined Speech Examples

Type Input Condition Decoded Output
Perceived Listened to “The prince smiled.” “The prince was smiling.”
Imagined Silently imagined “She walked away.” “She left quietly.”

These show that even without actual speech, the decoder retrieves plausible paraphrases.


🔎 Did the Decoder Use Brain Data for Training?

Component Trained on Brain Data? Purpose
Encoding Model ✅ Yes Predicts fMRI responses from word embeddings
Word Rate Model ✅ Yes Predicts number of words per scan
Language Model (GPT) ❌ No Pretrained and fine-tuned on stories only
Decoder (Beam/Nucleus) ❌ No Uses logic to combine LM + encoding model

✨ Final Thoughts

This study is a major leap forward in building non-invasive brain-computer interfaces. It proves that with enough semantic grounding and smart modeling, AI can reconstruct thoughts from brainwaves.

Key takeaways:

  • Brain decoding is possible without surgery
  • Semantic embeddings capture high-level meaning well
  • Timing alignment (via word rate model) is crucial
  • Meaningful language can be decoded from imagined speech and silent movies
  • Multiple decoding strategies (beam, nucleus, Bayesian) make the system robust and flexible

📄 Original Study: Tang et al., 2022 – bioRxiv Preprint