Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching

As the most natural language and emotional carrier, speech is widely used in intelligent furniture, vehicle navigation and other speech recognition technologies. With the continuous improvement of China's comprehensive national strength, the power industry has also ushered in a new stage of vig...

Full description

Saved in:

Bibliographic Details
Published in	Nanotechnology for environmental engineering Vol. 6; no. 3
Main Authors	Shibo, Zeng, Danke, Hong, Feifei, Hu, Li, Liu, Fei, Xie
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.12.2021 Springer Nature B.V
Subjects	Continuous improvement Earth and Environmental Science Earth Sciences Electric industries Emotion recognition Emotions Environment Environmental Science and Engineering Feature extraction Nanotechnology and Microengineering Original Article Signal classification Smart grid Speech Speech recognition Technology Voice recognition Long-term and short-term memory network Convolutional neural network Speech emotion recognition Human–computer interaction
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As the most natural language and emotional carrier, speech is widely used in intelligent furniture, vehicle navigation and other speech recognition technologies. With the continuous improvement of China's comprehensive national strength, the power industry has also ushered in a new stage of vigorous development. As the basis of production and life, it is a general trend to absorb voice processing technology. In order to better meet the actual needs of power grid dispatching, this paper applies voice processing technology to the field of smart grid dispatching. By testing and evaluating the recognition rate of the existing speech recognition system, a speech emotion recognition technology based on BiLSTM and CNN network dual attention model is proposed, which is suitable for the human–machine interaction system in the field of intelligent scheduling. Firstly, mel spectrum sequence of speech signal is extracted as input of BiLSTM network, and then, time context feature of speech signal is extracted by BiLSTM network. On this basis, the CNN network is used to extract the high-level emotional features from the low-level features and complete the emotional classification of speech signals. Emotional recognition tests were conducted on three different emotional databases, eNTERAFACE05, RML and AFW6.0. The experimental results show that the average recognition rates of this technology on three databases are 55.82%, 88.23% and 43.70%, respectively. In addition, the traditional speech emotion recognition technology is compared with the speech emotion recognition technology based on BiLSTM or CNN, which verifies the effectiveness of the technology.
ISSN:	2365-6379 2365-6387
DOI:	10.1007/s41204-021-00148-7