A Survey on Deep Learning-Based Approaches for Automated Lip Reading
Lip reading has garnered attention for its ability to support the hearing impaired and boost the performance of speech recognition systems. An overview of advancements in lip reading is presented, highlighting both conventional techniques and deep learning-based approaches like LipSync and many more...
Saved in:
Published in | 2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 6 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
06.06.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/DLCV65218.2025.11088852 |
Cover
Summary: | Lip reading has garnered attention for its ability to support the hearing impaired and boost the performance of speech recognition systems. An overview of advancements in lip reading is presented, highlighting both conventional techniques and deep learning-based approaches like LipSync and many more. The presented lip-reading systems have obtained an accuracy ranging from 52% to 89%, these studies use different datasets and different met. The analysis identifies key gaps, such as limited datasets, lack of diversity in training samples, and challenges in real-time application. It highlights the need for models that can generalize better across different accents, lighting conditions, and speaking paces. Despite these advances, challenges like speaker variability, noise, and dataset limitations remain. The proposed 4 model addresses these gaps by incorporating batch normalization and attention mechanisms, improving robustness and real-time applicability. We also propose methodologies that focus on spatiotemporal features to further enhance performance, as indicated by initial results in similar studies. Our method obtains approximately 91% accuracy, which is the state-of-the-art, as it exceeds existing models by 1-3 |
---|---|
DOI: | 10.1109/DLCV65218.2025.11088852 |