Improving Accessibility for the Visually Impaired with Enhanced Multi-class Object Detection Using YOLOv8L, YOLO11x, and Faster R-CNN with Diverse Backbones

Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhanc...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Disability Research Vol. 4; no. 4
Main Authors	Algaraady, Jeehaan, Albuhairy, Mohammad Mahyoob, Khan, Mohammad Zubair
Format	Journal Article
Language	English
Published	01.09.2025
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhance accessibility and environment-effective interaction for individuals with visual disabilities by detecting and naming objects in real-world environments. This study examines and optimizes the potential of a set of developed deep learning models, including YOLOv8L, YOLO11x, and Faster region-based convolutional neural network (R-CNN) with seven backbone models for multi-class object detection to enhance object recognition and provide auditory feedback; these models aim to bridge the gap between the visually impaired and their surroundings. In addition, we attempt to propose a system that translates detections into audible descriptions, empowering individuals to navigate and interact with the world independently by integrating object detection with text-to-speech (TTS) technology. The models leverage Arabic-translated PASCAL VOC 2007 and 2012 datasets, with performance evaluated through precision, recall, and mean average precision (mAP). The results revealed that YOLO11x achieves the highest mAP of 0.86, followed by YOLOv8L with an mAP of 0.83. Faster R-CNN with EfficientNet-B3, HRNet-w32, and MobileNetV3-Large showed the highest accuracy among other backbones with 79%, 78%, and 75%, respectively. The study proves the efficacy of deep learning models in accessibility applications as assistive technologies for individuals with visual impairments and highlights opportunities for future development.
ISSN:	1658-9912 2676-2633
DOI:	10.57197/JDR-2025-0642