A Survey on Deep Learning-Based Approaches for Automated Lip Reading

Lip reading has garnered attention for its ability to support the hearing impaired and boost the performance of speech recognition systems. An overview of advancements in lip reading is presented, highlighting both conventional techniques and deep learning-based approaches like LipSync and many more...

Full description

Saved in:

Bibliographic Details
Published in	2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 6
Main Authors	Kode, Harshita, Kotipalli, Bhavya Sri Sai, Cheruku, Hasritha Reddy, Gadde, Sai Sudha
Format	Conference Proceeding
Language	English
Published	IEEE 06.06.2025
Subjects	Accuracy Attention Mechanism Attention mechanisms Batch normalization Computational modeling Convolutional Neural Network Deep Learning Lighting Lips Real-time systems Recurrent Neural Network Spatiotemporal Features Spatiotemporal phenomena Surveys Training
Online Access	Get full text
DOI	10.1109/DLCV65218.2025.11088852

Cover

More Information
Summary:	Lip reading has garnered attention for its ability to support the hearing impaired and boost the performance of speech recognition systems. An overview of advancements in lip reading is presented, highlighting both conventional techniques and deep learning-based approaches like LipSync and many more. The presented lip-reading systems have obtained an accuracy ranging from 52% to 89%, these studies use different datasets and different met. The analysis identifies key gaps, such as limited datasets, lack of diversity in training samples, and challenges in real-time application. It highlights the need for models that can generalize better across different accents, lighting conditions, and speaking paces. Despite these advances, challenges like speaker variability, noise, and dataset limitations remain. The proposed 4 model addresses these gaps by incorporating batch normalization and attention mechanisms, improving robustness and real-time applicability. We also propose methodologies that focus on spatiotemporal features to further enhance performance, as indicated by initial results in similar studies. Our method obtains approximately 91% accuracy, which is the state-of-the-art, as it exceeds existing models by 1-3
DOI:	10.1109/DLCV65218.2025.11088852