Brain Treebank: Large-scale intracranial recordings from naturalistic language stimuli
We present the Brain Treebank, a large-scale dataset of electrophysiological neural responses, recorded from intracranial probes while 10 subjects watched one or more Hollywood movies. Subjects watched on average 2.6 Hollywood movies, for an average viewing time of 4.3 hours, and a total of 43 hours...
Saved in:
Main Authors | , , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
13.11.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2411.08343 |
Cover
Loading…
Summary: | We present the Brain Treebank, a large-scale dataset of electrophysiological
neural responses, recorded from intracranial probes while 10 subjects watched
one or more Hollywood movies. Subjects watched on average 2.6 Hollywood movies,
for an average viewing time of 4.3 hours, and a total of 43 hours. The audio
track for each movie was transcribed with manual corrections. Word onsets were
manually annotated on spectrograms of the audio track for each movie. Each
transcript was automatically parsed and manually corrected into the universal
dependencies (UD) formalism, assigning a part of speech to every word and a
dependency parse to every sentence. In total, subjects heard over 38,000
sentences (223,000 words), while they had on average 168 electrodes implanted.
This is the largest dataset of intracranial recordings featuring grounded
naturalistic language, one of the largest English UD treebanks in general, and
one of only a few UD treebanks aligned to multimodal features. We hope that
this dataset serves as a bridge between linguistic concepts, perception, and
their neural representations. To that end, we present an analysis of which
electrodes are sensitive to language features while also mapping out a rough
time course of language processing across these electrodes. The Brain Treebank
is available at https://BrainTreebank.dev/ |
---|---|
DOI: | 10.48550/arxiv.2411.08343 |