A simplified adversarial architecture for cross-subject silent speech recognition using electromyography
Objective . The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offs...
Saved in:
Published in | Journal of neural engineering Vol. 21; no. 5; pp. 56001 - 56018 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
England
IOP Publishing
01.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Objective
. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem. The prevailing adversarial network with a branching discriminator specializing in domain discrimination renders insufficiently direct contribution to categorical predictions of the classifier.
Approach
. To this end, we propose a simplified discrepancy-based adversarial network with a streamlined end-to-end structure for EMG-based cross-subject silent speech recognition. Highly aligned features across subjects are obtained by introducing a Nuclear-norm Wasserstein discrepancy metric on the back end of the classification network, which could be utilized for both classification and domain discrimination. Given the low-level and implicitly noisy nature of myoelectric signals, we devise a cascaded adaptive rectification network as the front-end feature extraction network, adaptively reshaping the intermediate feature map with automatically learnable channel-wise thresholds. The resulting features effectively filter out domain-specific information between subjects while retaining domain-invariant features critical for cross-subject recognition.
Main results
. A series of sentence-level classification experiments with 100 Chinese sentences demonstrate the efficacy of our method, achieving an average accuracy of 89.46% tested on 40 new subjects by training with data from 60 subjects. Especially, our method achieves a remarkable 10.07% improvement compared to the state-of-the-art model when tested on 10 new subjects with 20 subjects employed for training, surpassing its result even with three times training subjects.
Significance
. Our study demonstrates an improved classification performance of the proposed adversarial architecture using cross-subject myoelectric signals, providing a promising prospect for EMG-based speech interactive application. |
---|---|
Bibliography: | JNE-107522.R2 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1741-2560 1741-2552 1741-2552 |
DOI: | 10.1088/1741-2552/ad7321 |