Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching

As the most natural language and emotional carrier, speech is widely used in intelligent furniture, vehicle navigation and other speech recognition technologies. With the continuous improvement of China's comprehensive national strength, the power industry has also ushered in a new stage of vig...

Full description

Saved in:
Bibliographic Details
Published inNanotechnology for environmental engineering Vol. 6; no. 3
Main Authors Shibo, Zeng, Danke, Hong, Feifei, Hu, Li, Liu, Fei, Xie
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.12.2021
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:As the most natural language and emotional carrier, speech is widely used in intelligent furniture, vehicle navigation and other speech recognition technologies. With the continuous improvement of China's comprehensive national strength, the power industry has also ushered in a new stage of vigorous development. As the basis of production and life, it is a general trend to absorb voice processing technology. In order to better meet the actual needs of power grid dispatching, this paper applies voice processing technology to the field of smart grid dispatching. By testing and evaluating the recognition rate of the existing speech recognition system, a speech emotion recognition technology based on BiLSTM and CNN network dual attention model is proposed, which is suitable for the human–machine interaction system in the field of intelligent scheduling. Firstly, mel spectrum sequence of speech signal is extracted as input of BiLSTM network, and then, time context feature of speech signal is extracted by BiLSTM network. On this basis, the CNN network is used to extract the high-level emotional features from the low-level features and complete the emotional classification of speech signals. Emotional recognition tests were conducted on three different emotional databases, eNTERAFACE05, RML and AFW6.0. The experimental results show that the average recognition rates of this technology on three databases are 55.82%, 88.23% and 43.70%, respectively. In addition, the traditional speech emotion recognition technology is compared with the speech emotion recognition technology based on BiLSTM or CNN, which verifies the effectiveness of the technology.
ISSN:2365-6379
2365-6387
DOI:10.1007/s41204-021-00148-7