Multi-modal Sentiment Analysis of Audio and Visual Context of the Data using Machine Learning
Sentiment analysis on video streams in real time comprises using visual and/or aural data from the data stream to identify a subject's emotional expressions over time. Sentiment can be assessed using a variety of modalities, including speech, lip movements, and facial expression. This paper pre...
Saved in:
Published in | 2022 3rd International Conference on Smart Electronics and Communication (ICOSEC) pp. 1198 - 1205 |
---|---|
Main Author | |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
20.10.2022
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICOSEC54921.2022.9951988 |
Cover
Loading…
Summary: | Sentiment analysis on video streams in real time comprises using visual and/or aural data from the data stream to identify a subject's emotional expressions over time. Sentiment can be assessed using a variety of modalities, including speech, lip movements, and facial expression. This paper presents a multi-modal deep learning strategy for sentiment classification that fuses derived features from an audiovisual input stream in real time. The proposed system consists of four small deep neural network models that analyse visual and auditory data at the same time. To create a final forecast, the visual and audio emotion features are merged into a single stream and an exponentially weighted moving average is used to gather data over time. This paper introduces a method for multimodal sentiment analysis based on feature extraction as well as emotion recognition from text and visual modalities using convolutional neural networks. By merging visual, text, and audio capabilities, a 12% performance gain has been achieved. While using RNN-COVAREP, a few critical factors that are frequently missed in multimodal analysis in research has been analysed. |
---|---|
DOI: | 10.1109/ICOSEC54921.2022.9951988 |