Semantic reconstruction of continuous language from non-invasive brain recordings
🧠 From Brain signals to Story: Decoding Continuous Language from fMRI Using Language Models
Understanding what someone is thinking just by looking at their brain activity might sound like science fiction. But in a groundbreaking study, researchers have shown it’s possible to reconstruct entire sentences from non-invasive fMRI recordings. This blog explains how they did it, step by step, with clear examples and visuals.
📄 Citation: Tang et al., 2022 – Reconstructing language from non-invasive brain recordings
🔍 1. Why is Decoding Continuous Language from fMRI So Challenging?
Imagine someone listening to the sentence:
“The cat jumped over the fence.”
With fMRI, the brain’s response to this sentence isn’t mapped word-by-word. Why?
-
fMRI is slow: it measures the blood-oxygen-level-dependent (BOLD) signal, which is very slow. A single burst of neural activity causes a BOLD response that unfolds over ~10 seconds.
-
Since natural English speech occurs at ~2 words per second, this means that each brain image (fMRI scan) can be influenced by 20 or more overlapping words.
-
This temporal smearing makes it extremely difficult to isolate the brain’s response to individual words—posing a major challenge for sentence-level decoding.
So the challenge becomes:
Given a low SNR of brain activity, can we guess the exact sentence the person heard or imagined?
This is an ill-posed inverse problem with many possible solutions.
📊 2. Overview of the Brain Decoding Pipeline
The authors built a three-part system:
- Encoding Model: Learns how word meanings (stimulus representations) map to brain activity
- Word Rate Model: Takes brain activity and predicts the number of words per TR (word rate)
- Language Decoder: Uses methods like beam search, nucleus sampling, and a Bayesian decoder to reconstruct word sequences using a language model and the encoding model
🧮 Bayesian Formulation of Decoding
The decoding framework is grounded in Bayes’ Theorem:
\[P(S \mid B) \propto P(B \mid S) \cdot P(S)\]Where:
- ( S ): candidate sentence
- ( B ): observed brain activity
- ( P(S) ): prior probability from a language model (LM)
- ( P(B \mid S) ): likelihood estimated from the encoding model (how well the sentence explains the observed brain response)
This allows combining top-down language expectations with bottom-up neural evidence.
🤖 3. Encoding Model: Mapping Language to Brain
🔹 Feature Extraction
- Uses a GPT-style language model to extract 768-dimensional semantic embeddings
- Example: For the phrase “The cat jumped over the fence”, extract embedding for “fence”
⬇️ Downsampling & Delays
- fMRI samples every 2s; brain reacts slower than speech
- Embeddings are resampled using a Lanczos filter
- Delays at 2, 4, 6, 8 seconds applied
- Final feature vector per word: 768 × 4 = 3,072 dimensions
📊 Predicting Brain Activity
- For each voxel, fit a ridge regression to predict fMRI response from 3,072 features
- Use top 10,000 best-predicting voxels
- Encoding model weight matrix:
[10,000 voxels × 3,072 features]
🔀 4. Word Rate Model: Estimating When Words Happen
🔹 Purpose
- fMRI tells you “something happened” every 2 seconds
- This model predicts how many words occurred per scan
💡 How It Works
- Input: fMRI signals from language areas
- Uses future timepoints (t+2s to t+8s) to account for delay
- Predicts number of words (e.g., 2.4 words)
- Uses rounding to schedule when to decode next words
📊 Model Size
- For
V
voxels and 4 delays: input feature size =4 × V
- Weight matrix:
[1 × 4V]
📈 Example
- Predicted word count in 2s scan = 2
- Words scheduled at: 0.8s and 1.6s in that window
🔄 5. Language Decoding Methods: Beam Search, Nucleus Sampling, and Bayesian Approach
🎓 Goal
Find the most likely sentence that explains observed brain data
The authors use a combination of decoding strategies:
- Beam Search: Keeps top-k most likely word sequences at each time step
- Nucleus Sampling: Selects from the top-p most probable words to introduce diversity
- Bayesian Decoder: Combines prior from the language model and likelihood from brain encoding
🔢 Step-by-Step Decoding Example
Let’s walk through how the model decodes one word at a time using brain data.
🔹 Setup
word_times.shape = (1671,)
→ model will decode 1671 wordslanczos_mat.shape = (291, 1671)
→ maps 1671 predicted word times to 291 TRs- Loop runs once per word:
for sample_index in range(1671)
✅ Step 1: Affected TRs
Each word affects multiple fMRI timepoints due to the slow BOLD signal:
# Step-by-step example of affected TRs calculation
start_index = 0
end_index = 0
lanczos_slice = lanczos_mat[:, start_index] # shape: (291,)
nonzero_trs = np.where(lanczos_slice != 0)[0] # e.g., [0, 1, 2]
start_tr = nonzero_trs[0] + min(config.STIM_DELAYS) # e.g., +2
end_tr = nonzero_trs[-1] + max(config.STIM_DELAYS) # e.g., +8
affected = np.arange(start_tr, end_tr + 1) # Final TRs
Example output:
affected_trs(0, 0, lanczos_mat) → [1, 2, 3, 4, 5, 6]
✅ Step 2: Propose Initial Words
If the beam is empty:
INIT = ["She", "He", "They"]
Beam = ["She"]
✅ Step 3: Propose Next Words from LM
Context = “She”. Language model predicts: | Word | Prob | |——|——| | went | 0.6 | | had | 0.2 | | is | 0.1 |
Nucleus sampling with top-90% mass → nucleus = ["went", "had"]
✅ Step 4: Score Candidates
Score each extended sentence using the encoding model:
"She went" → likelihood = -10.2
"She had" → likelihood = -13.5
Keep top hypothesis:
Beam = ["She went"]
🕐 Step 5: Repeat for Word 2
Context = “She went”
- LM suggests:
to
,back
,away
- Extensions scored:
"She went to"
,"She went back"
- Encoding model ranks and updates beam
Repeat until all 1671 words are decoded.
📊 Evaluation Metrics
- Perplexity (LM prior quality): lower is better
- Encoding model R² score: how well the model predicts fMRI signals
- Decoding accuracy: BLEU/METEOR scores for reconstructed vs. true sentences
- Human evaluation: qualitative rating of perceived vs. generated story coherence
🧪 Perceived vs. Imagined Speech Examples
Type | Input Condition | Decoded Output |
---|---|---|
Perceived | Listened to “The prince smiled.” | “The prince was smiling.” |
Imagined | Silently imagined “She walked away.” | “She left quietly.” |
These show that even without actual speech, the decoder retrieves plausible paraphrases.
🔎 Did the Decoder Use Brain Data for Training?
Component | Trained on Brain Data? | Purpose |
---|---|---|
Encoding Model | ✅ Yes | Predicts fMRI responses from word embeddings |
Word Rate Model | ✅ Yes | Predicts number of words per scan |
Language Model (GPT) | ❌ No | Pretrained and fine-tuned on stories only |
Decoder (Beam/Nucleus) | ❌ No | Uses logic to combine LM + encoding model |
✨ Final Thoughts
This study is a major leap forward in building non-invasive brain-computer interfaces. It proves that with enough semantic grounding and smart modeling, AI can reconstruct thoughts from brainwaves.
Key takeaways:
- Brain decoding is possible without surgery
- Semantic embeddings capture high-level meaning well
- Timing alignment (via word rate model) is crucial
- Meaningful language can be decoded from imagined speech and silent movies
- Multiple decoding strategies (beam, nucleus, Bayesian) make the system robust and flexible
📄 Original Study: Tang et al., 2022 – bioRxiv Preprint