Improving Accessibility for the Visually Impaired with Enhanced Multi-class Object Detection Using YOLOv8L, YOLO11x, and Faster R-CNN with Diverse Backbones

Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhanc...

Full description

Saved in:
Bibliographic Details
Published inJournal of Disability Research Vol. 4; no. 4
Main Authors Algaraady, Jeehaan, Albuhairy, Mohammad Mahyoob, Khan, Mohammad Zubair
Format Journal Article
LanguageEnglish
Published 01.09.2025
Online AccessGet full text

Cover

Loading…
Abstract Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhance accessibility and environment-effective interaction for individuals with visual disabilities by detecting and naming objects in real-world environments. This study examines and optimizes the potential of a set of developed deep learning models, including YOLOv8L, YOLO11x, and Faster region-based convolutional neural network (R-CNN) with seven backbone models for multi-class object detection to enhance object recognition and provide auditory feedback; these models aim to bridge the gap between the visually impaired and their surroundings. In addition, we attempt to propose a system that translates detections into audible descriptions, empowering individuals to navigate and interact with the world independently by integrating object detection with text-to-speech (TTS) technology. The models leverage Arabic-translated PASCAL VOC 2007 and 2012 datasets, with performance evaluated through precision, recall, and mean average precision (mAP). The results revealed that YOLO11x achieves the highest mAP of 0.86, followed by YOLOv8L with an mAP of 0.83. Faster R-CNN with EfficientNet-B3, HRNet-w32, and MobileNetV3-Large showed the highest accuracy among other backbones with 79%, 78%, and 75%, respectively. The study proves the efficacy of deep learning models in accessibility applications as assistive technologies for individuals with visual impairments and highlights opportunities for future development.
AbstractList Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhance accessibility and environment-effective interaction for individuals with visual disabilities by detecting and naming objects in real-world environments. This study examines and optimizes the potential of a set of developed deep learning models, including YOLOv8L, YOLO11x, and Faster region-based convolutional neural network (R-CNN) with seven backbone models for multi-class object detection to enhance object recognition and provide auditory feedback; these models aim to bridge the gap between the visually impaired and their surroundings. In addition, we attempt to propose a system that translates detections into audible descriptions, empowering individuals to navigate and interact with the world independently by integrating object detection with text-to-speech (TTS) technology. The models leverage Arabic-translated PASCAL VOC 2007 and 2012 datasets, with performance evaluated through precision, recall, and mean average precision (mAP). The results revealed that YOLO11x achieves the highest mAP of 0.86, followed by YOLOv8L with an mAP of 0.83. Faster R-CNN with EfficientNet-B3, HRNet-w32, and MobileNetV3-Large showed the highest accuracy among other backbones with 79%, 78%, and 75%, respectively. The study proves the efficacy of deep learning models in accessibility applications as assistive technologies for individuals with visual impairments and highlights opportunities for future development.
Author Albuhairy, Mohammad Mahyoob
Algaraady, Jeehaan
Khan, Mohammad Zubair
Author_xml – sequence: 1
  givenname: Jeehaan
  orcidid: 0000-0003-3901-4648
  surname: Algaraady
  fullname: Algaraady, Jeehaan
– sequence: 2
  givenname: Mohammad Mahyoob
  orcidid: 0000-0002-6664-1017
  surname: Albuhairy
  fullname: Albuhairy, Mohammad Mahyoob
– sequence: 3
  givenname: Mohammad Zubair
  orcidid: 0000-0002-2409-7172
  surname: Khan
  fullname: Khan, Mohammad Zubair
BookMark eNotkF1PwjAUhhuDiYjcet0fQLHt1q67RD4UM1lCxMSrpetaKY6NtAPlv_hjHWDOxXve5ORJznMLOlVdaQDuCR6yiMTRw8tkiSimDGEe0ivQpTziiPIg6IAu4UygOCb0BvS932CMg4CEOGJd8Dvf7lx9sNUnHCmlvbe5LW1zhKZ2sFlr-G79XpblEbaH0jpdwG_brOG0WstKte11XzYWqVJ6D9N8o1UDJ7ppw9YVXPkT-CNN0oNIBueFkJ8BlFUBZ9I32sElGi8WF-bEHrTzGj5K9ZW37_k7cG1k6XX_P3tgNZu-jZ9Rkj7Nx6MEKRKHFIk4kpxFylCBC4k5ZoWMQkFzHuZCqKIdEzKa0zjXUSyMIdJgrhVjRkip8qAHhheucrX3Tpts5-xWumNGcHbWm7V6s5Pe7KQ3-AMdYHBy
Cites_doi 10.1007/s11263-009-0275-4
10.1016/j.patcog.2018.01.027
10.1088/1742-6596/1004/1/012029
10.48550/arXiv.1506.01497
10.1109/IC3IoT60841.2024.10550247
10.1109/NMITCON58196.2023.10276255
10.3390/e22090941
10.1109/ICSPIS60075.2023.10344272
10.1109/ACCESS.2020.2966651
10.36948/ijfmr.2024.v06i05.27133
10.1109/ICECA58529.2023.10394723
10.1109/ACCESS.2023.3287147
10.37391/ijeer.100205
10.1109/ICACRS58579.2023.10404820
10.20895/infotel.v16i3.1189
10.1109/ASIANCON58793.2023.10269899
10.24235/itej.v8i2.123
10.22541/au.168788163.30701797/v1
10.1109/DISCOVER52564.2021.9663608
10.1109/CVPR.2016.91
10.1109/CVPR.2018.00114
10.1109/ICCONS.2018.8663016
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.57197/JDR-2025-0642
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2676-2633
ExternalDocumentID 10_57197_JDR_2025_0642
GroupedDBID AAYXX
ABDBF
ALMA_UNASSIGNED_HOLDINGS
CITATION
ESX
GROUPED_DOAJ
ID FETCH-LOGICAL-c1942-897a657cf280da0605da7482b64b88cdcdcf452b29be798ff1af06ec55f8aacb3
ISSN 1658-9912
IngestDate Thu Aug 14 00:02:06 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c1942-897a657cf280da0605da7482b64b88cdcdcf452b29be798ff1af06ec55f8aacb3
ORCID 0000-0002-6664-1017
0000-0003-3901-4648
0000-0002-2409-7172
OpenAccessLink https://www.scienceopen.com/hosted-document?doi=10.57197/JDR-2025-0642
ParticipantIDs crossref_primary_10_57197_JDR_2025_0642
PublicationCentury 2000
PublicationDate 2025-09
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09
PublicationDecade 2020
PublicationTitle Journal of Disability Research
PublicationYear 2025
References S Shanmugapriya (r26) 2023
L Bougheloum (r3) 2023
J Du (r8) 2018; 1004
L Nkalubo (r21) 2023
R Soniya (r27) 2020; 4
K Lee (r18) 2018
M Hussan (r12) 2022; 10
R Joshi (r15) 2020; 22
S Kadam (r16) 2024; 6
C Sagana (r25) 2021
M Everingham (r9) 2010; 88
G He (r11) 2020; 8
CT Patel (r22) 2018
D Das (r7) 2024
AR Jambhulkar (r13) 2023
J Redmon (r23) 2016
DD Aboyomi (r1) 2023; 8
Z Kuang (r17) 2018; 78
S Joseph (r14) 2023
S Ren (r24) 2015; 28
M Hanif (r10) 2024; 16
Y Nitta (r20) 2023; 11
K Masal (r19) 2023
References_xml – volume: 88
  start-page: 303
  issue: 2
  year: 2010
  ident: r9
  article-title: The PASCAL Visual Object Classes (VOC) challenge
  publication-title: Int J Comput Vis
  doi: 10.1007/s11263-009-0275-4
– volume: 78
  start-page: 198
  year: 2018
  ident: r17
  article-title: Integrating multi-level deep learning and concept ontology for large-scale visual recognition
  publication-title: Pattern Recognit
  doi: 10.1016/j.patcog.2018.01.027
– volume: 1004
  year: 2018
  ident: r8
  article-title: Understanding of object detection based on CNN family and YOLO
  publication-title: J Phys Conf Ser
  doi: 10.1088/1742-6596/1004/1/012029
– volume: 28
  start-page: 91
  year: 2015
  ident: r24
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: Adv Neural Inf Process Syst
  doi: 10.48550/arXiv.1506.01497
– start-page: 1
  year: 2024
  ident: r7
  article-title: Object detection with voice output for visually impaired
  doi: 10.1109/IC3IoT60841.2024.10550247
– start-page: 1
  year: 2023
  ident: r14
  article-title: Object detection and localization for visually impaired people
  doi: 10.1109/NMITCON58196.2023.10276255
– volume: 22
  issue: 9
  year: 2020
  ident: r15
  article-title: Efficient multi-object detection and smart navigation using artificial intelligence for visually impaired people
  publication-title: Entropy
  doi: 10.3390/e22090941
– start-page: 51
  year: 2023
  ident: r3
  article-title: Real-time obstacle detection for visually impaired people using deep learning
  doi: 10.1109/ICSPIS60075.2023.10344272
– volume: 8
  start-page: 1543615447
  year: 2020
  ident: r11
  article-title: Feature selection-based hierarchical deep network for image classification
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.2966651
– volume: 6
  issue: 5
  year: 2024
  ident: r16
  article-title: Advancements in image detection: a comprehensive approach to object localization and classification using deep learning techniques
  publication-title: Int J Multidiscip Res
  doi: 10.36948/ijfmr.2024.v06i05.27133
– start-page: 658
  year: 2023
  ident: r19
  article-title: Deep learning attentional dense based indoor object recognition for visually impaired people
  doi: 10.1109/ICECA58529.2023.10394723
– volume: 11
  start-page: 62932
  year: 2023
  ident: r20
  article-title: Importance of learning of objects in urban scenes for assisting visually impaired people
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2023.3287147
– volume: 4
  start-page: 264
  issue: 5
  year: 2020
  ident: r27
  article-title: Text and object recognition using deep learning for visually impaired people
  publication-title: Int J Trend Sci Res Develop
– volume: 10
  start-page: 80
  issue: 2
  year: 2022
  ident: r12
  article-title: Object detection and recognition in real time using deep learning for visually impaired people
  publication-title: Int J Electr Electron Res
  doi: 10.37391/ijeer.100205
– start-page: 1757
  year: 2023
  ident: r26
  article-title: Audio assist: Enabling object detection through speech for the visually impaired
  doi: 10.1109/ICACRS58579.2023.10404820
– volume: 16
  start-page: 502
  issue: 3
  year: 2024
  ident: r10
  article-title: Rupiah banknotes detection comparison of the faster R-CNN algorithm and YOLOv5
  publication-title: J Infotel
  doi: 10.20895/infotel.v16i3.1189
– start-page: 1
  year: 2023
  ident: r13
  article-title: Real-time object detection and audio feedback for the visually impaired
  doi: 10.1109/ASIANCON58793.2023.10269899
– volume: 8
  start-page: 96
  issue: 2
  year: 2023
  ident: r1
  article-title: A comparative analysis of modern object detection algorithms: YOLO vs. SSD vs. faster R-CNN
  publication-title: Inf Technol Eng J
  doi: 10.24235/itej.v8i2.123
– year: 2023
  ident: r21
  article-title: Real-time object detection using an ensemble of one-stage and two-stage object detection models with dynamic fine-tuning using Kullback–Leibler divergence
  publication-title: Authorea
  doi: 10.22541/au.168788163.30701797/v1
– start-page: 318
  year: 2021
  ident: r25
  article-title: Object recognition system for visually impaired people
  doi: 10.1109/DISCOVER52564.2021.9663608
– start-page: 779
  year: 2016
  ident: r23
  article-title: You only look once: Unified, real-time object detection
  doi: 10.1109/CVPR.2016.91
– start-page: 1034
  year: 2018
  ident: r18
  article-title: Hierarchical novelty detection for visual object recognition
  doi: 10.1109/CVPR.2018.00114
– start-page: 1
  year: 2018
  ident: r22
  article-title: Multisensor-based object detection in indoor environment for visually impaired people
  doi: 10.1109/ICCONS.2018.8663016
SSID ssj0003314075
Score 2.301953
Snippet Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection...
SourceID crossref
SourceType Index Database
Title Improving Accessibility for the Visually Impaired with Enhanced Multi-class Object Detection Using YOLOv8L, YOLO11x, and Faster R-CNN with Diverse Backbones
Volume 4
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1fa9swEMBF1r3sZbRsY3-LHgZ7SLXZimzZj13TUkKbQGlHtxdzkmVc1iajS0qzz7LPsc-3k2Qr7sigGwFj5Oji-H7o7uTTiZC3poRcRjphSpYDJsBopowAjFrRFkrItHKVmI7H6eGZGJ0n573er07W0mKu3usfa9eV_I9WsQ31alfJ_oNmg1BswHPULx5Rw3i8l45XMwK7bt9Dn-m6DKmDny6-L-DycmlrAMNFyDTfn9b-vb9bfcu0daD7E2VnZHD8mRu_e7hPJvg8OZrcZEdupMbTOL5t8z0PwNZY6J-wvfHYyx26HA_T_wj6q7J7APzF8x02dX2d_9-ZTbPc2ZUlAH7kHxlTw4pe-zahxr_hrh3Pari6ApskUi9nMxUMR-1ndMP1LwuFfbqTGzwJ2VvteIwOEkMX1g_YxrXxVKaMp75-RjuIiw6rYp1pSGTsag6PhifM_1Iq-MoIti_-_7CNIWMRYyUnocD-he1f2P4PyEOO4QnvhPLWAxgMMGx1NZ7D_ft6oU7Ehzu30PGHOo7N6SZ53OiF7nq8tkjPTJ-QnwEtegctimhRRIu2aNEWLWoRoC1atIMW9WjRgBZ1aNEGrR3agLVDESvqsaIOKy-zwYoGrJ6Ss4P9071D1mzkwXScC7S4uYQ0kbriWVRChBF0CVJkXKVCZZku8VOJhCueKyPzrKpiqKLU6CSpMgCtBs_IxhTlPyfUmDyquFAYaOPD4wlI9Me5VJmpTFlK9YK8ax9n8c3XaynW6-7lvb_5ijxasfmabMyvF-YNuqJzte30_hvxE4tZ
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+Accessibility+for+the+Visually+Impaired+with+Enhanced+Multi-class+Object+Detection+Using+YOLOv8L%2C+YOLO11x%2C+and+Faster+R-CNN+with+Diverse+Backbones&rft.jtitle=Journal+of+Disability+Research&rft.au=Algaraady%2C+Jeehaan&rft.au=Albuhairy%2C+Mohammad+Mahyoob&rft.au=Khan%2C+Mohammad+Zubair&rft.date=2025-09-01&rft.issn=1658-9912&rft.eissn=2676-2633&rft.volume=4&rft.issue=4&rft_id=info:doi/10.57197%2FJDR-2025-0642&rft.externalDBID=n%2Fa&rft.externalDocID=10_57197_JDR_2025_0642
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1658-9912&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1658-9912&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1658-9912&client=summon