Text Generation of Speech Imagery Based on an Enhanced CTA-BiLSTM Model Utilizing EEG Signals

Recent studies have demonstrated the potential application of speech imagery neural signals in brain-computer interface (BCI) technology. Text generation based on speech imagery offers a natural communication method for individuals with speech disabilities. However, the limitations in imagined conte...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on consumer electronics Vol. 71; no. 2; pp. 3442 - 3453
Main Authors	Pan, Hongguang, Chu, Xin, Miao, Rui, Wang, Mei, Wang, Yiran, Li, Zhuoyi
Format	Journal Article
Language	English
Published	IEEE 01.05.2025
Subjects	Accuracy attention mechanism BCI BiLSTM Brain modeling decode Decoding EEG Electrodes Electroencephalography Feature extraction Hidden Markov models Labeling Signal processing algorithms Speech enhancement speech imagery text generation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent studies have demonstrated the potential application of speech imagery neural signals in brain-computer interface (BCI) technology. Text generation based on speech imagery offers a natural communication method for individuals with speech disabilities. However, the limitations in imagined content and the immaturity of text generation technology currently constitute an obstacle to its applications. Therefore, this study proposes an enhanced CTA-BiLSTM model for efficient text generation utilizing speech imagery electroencephalography (EEG) signals, significantly enhancing the accuracy and fluency of text generation. Firstly, distinct from the prevailing imagination of characters and words, this study has assembled a sentence-level EEG dataset from ten subjects to facilitate communication. Subsequently, addressing the temporal dynamics characteristics and sequence dependencies of sentence signals, we employ dynamic time warping (DTW) and hidden Markov models (HMM) for accurate temporal alignment and signal annotation to generate fine-grained sentence labels. Finally, the proposed CTA-BiLSTM model leverages channel-time attention mechanism to dynamically adjust weights across channels and time, emphasizing critical features. Concurrently, the bidirectional long short-term memory (BiLSTM) network captures and utilizes long-term dependencies in the EEG signals, thereby enhancing the accuracy of the model in decoding complex temporal patterns. The experimental results demonstrate that the average sentence decoding accuracy can reach 67.50% on the self-built dataset, realizing a better evaluation accuracy and validating its potential for application.
ISSN:	0098-3063 1558-4127
DOI:	10.1109/TCE.2025.3557912