Improving Accessibility for the Visually Impaired with Enhanced Multi-class Object Detection Using YOLOv8L, YOLO11x, and Faster R-CNN with Diverse Backbones

Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhanc...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Disability Research Vol. 4; no. 4
Main Authors	Algaraady, Jeehaan, Albuhairy, Mohammad Mahyoob, Khan, Mohammad Zubair
Format	Journal Article
Language	English
Published	01.09.2025
Online Access	Get full text

Cover

Loading…

Abstract	Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhance accessibility and environment-effective interaction for individuals with visual disabilities by detecting and naming objects in real-world environments. This study examines and optimizes the potential of a set of developed deep learning models, including YOLOv8L, YOLO11x, and Faster region-based convolutional neural network (R-CNN) with seven backbone models for multi-class object detection to enhance object recognition and provide auditory feedback; these models aim to bridge the gap between the visually impaired and their surroundings. In addition, we attempt to propose a system that translates detections into audible descriptions, empowering individuals to navigate and interact with the world independently by integrating object detection with text-to-speech (TTS) technology. The models leverage Arabic-translated PASCAL VOC 2007 and 2012 datasets, with performance evaluated through precision, recall, and mean average precision (mAP). The results revealed that YOLO11x achieves the highest mAP of 0.86, followed by YOLOv8L with an mAP of 0.83. Faster R-CNN with EfficientNet-B3, HRNet-w32, and MobileNetV3-Large showed the highest accuracy among other backbones with 79%, 78%, and 75%, respectively. The study proves the efficacy of deep learning models in accessibility applications as assistive technologies for individuals with visual impairments and highlights opportunities for future development.
AbstractList	Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection technologies can potentially transform accessibility solutions, especially for individuals with visual impairments. This study aims to enhance accessibility and environment-effective interaction for individuals with visual disabilities by detecting and naming objects in real-world environments. This study examines and optimizes the potential of a set of developed deep learning models, including YOLOv8L, YOLO11x, and Faster region-based convolutional neural network (R-CNN) with seven backbone models for multi-class object detection to enhance object recognition and provide auditory feedback; these models aim to bridge the gap between the visually impaired and their surroundings. In addition, we attempt to propose a system that translates detections into audible descriptions, empowering individuals to navigate and interact with the world independently by integrating object detection with text-to-speech (TTS) technology. The models leverage Arabic-translated PASCAL VOC 2007 and 2012 datasets, with performance evaluated through precision, recall, and mean average precision (mAP). The results revealed that YOLO11x achieves the highest mAP of 0.86, followed by YOLOv8L with an mAP of 0.83. Faster R-CNN with EfficientNet-B3, HRNet-w32, and MobileNetV3-Large showed the highest accuracy among other backbones with 79%, 78%, and 75%, respectively. The study proves the efficacy of deep learning models in accessibility applications as assistive technologies for individuals with visual impairments and highlights opportunities for future development.
Author	Albuhairy, Mohammad Mahyoob Algaraady, Jeehaan Khan, Mohammad Zubair
Author_xml	– sequence: 1 givenname: Jeehaan orcidid: 0000-0003-3901-4648 surname: Algaraady fullname: Algaraady, Jeehaan – sequence: 2 givenname: Mohammad Mahyoob orcidid: 0000-0002-6664-1017 surname: Albuhairy fullname: Albuhairy, Mohammad Mahyoob – sequence: 3 givenname: Mohammad Zubair orcidid: 0000-0002-2409-7172 surname: Khan fullname: Khan, Mohammad Zubair
BookMark	eNotkF1PwjAUhhuDiYjcet0fQLHt1q67RD4UM1lCxMSrpetaKY6NtAPlv_hjHWDOxXve5ORJznMLOlVdaQDuCR6yiMTRw8tkiSimDGEe0ivQpTziiPIg6IAu4UygOCb0BvS932CMg4CEOGJd8Dvf7lx9sNUnHCmlvbe5LW1zhKZ2sFlr-G79XpblEbaH0jpdwG_brOG0WstKte11XzYWqVJ6D9N8o1UDJ7ppw9YVXPkT-CNN0oNIBueFkJ8BlFUBZ9I32sElGi8WF-bEHrTzGj5K9ZW37_k7cG1k6XX_P3tgNZu-jZ9Rkj7Nx6MEKRKHFIk4kpxFylCBC4k5ZoWMQkFzHuZCqKIdEzKa0zjXUSyMIdJgrhVjRkip8qAHhheucrX3Tpts5-xWumNGcHbWm7V6s5Pe7KQ3-AMdYHBy
Cites_doi	10.1007/s11263-009-0275-4 10.1016/j.patcog.2018.01.027 10.1088/1742-6596/1004/1/012029 10.48550/arXiv.1506.01497 10.1109/IC3IoT60841.2024.10550247 10.1109/NMITCON58196.2023.10276255 10.3390/e22090941 10.1109/ICSPIS60075.2023.10344272 10.1109/ACCESS.2020.2966651 10.36948/ijfmr.2024.v06i05.27133 10.1109/ICECA58529.2023.10394723 10.1109/ACCESS.2023.3287147 10.37391/ijeer.100205 10.1109/ICACRS58579.2023.10404820 10.20895/infotel.v16i3.1189 10.1109/ASIANCON58793.2023.10269899 10.24235/itej.v8i2.123 10.22541/au.168788163.30701797/v1 10.1109/DISCOVER52564.2021.9663608 10.1109/CVPR.2016.91 10.1109/CVPR.2018.00114 10.1109/ICCONS.2018.8663016
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.57197/JDR-2025-0642
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
EISSN	2676-2633
ExternalDocumentID	10_57197_JDR_2025_0642
GroupedDBID	AAYXX ABDBF ALMA_UNASSIGNED_HOLDINGS CITATION ESX GROUPED_DOAJ
ID	FETCH-LOGICAL-c1942-897a657cf280da0605da7482b64b88cdcdcf452b29be798ff1af06ec55f8aacb3
ISSN	1658-9912
IngestDate	Thu Aug 14 00:02:06 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
License	https://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c1942-897a657cf280da0605da7482b64b88cdcdcf452b29be798ff1af06ec55f8aacb3
ORCID	0000-0002-6664-1017 0000-0003-3901-4648 0000-0002-2409-7172
OpenAccessLink	https://www.scienceopen.com/hosted-document?doi=10.57197/JDR-2025-0642
ParticipantIDs	crossref_primary_10_57197_JDR_2025_0642
PublicationCentury	2000
PublicationDate	2025-09
PublicationDateYYYYMMDD	2025-09-01
PublicationDate_xml	– month: 09 year: 2025 text: 2025-09
PublicationDecade	2020
PublicationTitle	Journal of Disability Research
PublicationYear	2025
References	S Shanmugapriya (r26) 2023 L Bougheloum (r3) 2023 J Du (r8) 2018; 1004 L Nkalubo (r21) 2023 R Soniya (r27) 2020; 4 K Lee (r18) 2018 M Hussan (r12) 2022; 10 R Joshi (r15) 2020; 22 S Kadam (r16) 2024; 6 C Sagana (r25) 2021 M Everingham (r9) 2010; 88 G He (r11) 2020; 8 CT Patel (r22) 2018 D Das (r7) 2024 AR Jambhulkar (r13) 2023 J Redmon (r23) 2016 DD Aboyomi (r1) 2023; 8 Z Kuang (r17) 2018; 78 S Joseph (r14) 2023 S Ren (r24) 2015; 28 M Hanif (r10) 2024; 16 Y Nitta (r20) 2023; 11 K Masal (r19) 2023
References_xml	– volume: 88 start-page: 303 issue: 2 year: 2010 ident: r9 article-title: The PASCAL Visual Object Classes (VOC) challenge publication-title: Int J Comput Vis doi: 10.1007/s11263-009-0275-4 – volume: 78 start-page: 198 year: 2018 ident: r17 article-title: Integrating multi-level deep learning and concept ontology for large-scale visual recognition publication-title: Pattern Recognit doi: 10.1016/j.patcog.2018.01.027 – volume: 1004 year: 2018 ident: r8 article-title: Understanding of object detection based on CNN family and YOLO publication-title: J Phys Conf Ser doi: 10.1088/1742-6596/1004/1/012029 – volume: 28 start-page: 91 year: 2015 ident: r24 article-title: Faster R-CNN: Towards real-time object detection with region proposal networks publication-title: Adv Neural Inf Process Syst doi: 10.48550/arXiv.1506.01497 – start-page: 1 year: 2024 ident: r7 article-title: Object detection with voice output for visually impaired doi: 10.1109/IC3IoT60841.2024.10550247 – start-page: 1 year: 2023 ident: r14 article-title: Object detection and localization for visually impaired people doi: 10.1109/NMITCON58196.2023.10276255 – volume: 22 issue: 9 year: 2020 ident: r15 article-title: Efficient multi-object detection and smart navigation using artificial intelligence for visually impaired people publication-title: Entropy doi: 10.3390/e22090941 – start-page: 51 year: 2023 ident: r3 article-title: Real-time obstacle detection for visually impaired people using deep learning doi: 10.1109/ICSPIS60075.2023.10344272 – volume: 8 start-page: 1543615447 year: 2020 ident: r11 article-title: Feature selection-based hierarchical deep network for image classification publication-title: IEEE Access doi: 10.1109/ACCESS.2020.2966651 – volume: 6 issue: 5 year: 2024 ident: r16 article-title: Advancements in image detection: a comprehensive approach to object localization and classification using deep learning techniques publication-title: Int J Multidiscip Res doi: 10.36948/ijfmr.2024.v06i05.27133 – start-page: 658 year: 2023 ident: r19 article-title: Deep learning attentional dense based indoor object recognition for visually impaired people doi: 10.1109/ICECA58529.2023.10394723 – volume: 11 start-page: 62932 year: 2023 ident: r20 article-title: Importance of learning of objects in urban scenes for assisting visually impaired people publication-title: IEEE Access doi: 10.1109/ACCESS.2023.3287147 – volume: 4 start-page: 264 issue: 5 year: 2020 ident: r27 article-title: Text and object recognition using deep learning for visually impaired people publication-title: Int J Trend Sci Res Develop – volume: 10 start-page: 80 issue: 2 year: 2022 ident: r12 article-title: Object detection and recognition in real time using deep learning for visually impaired people publication-title: Int J Electr Electron Res doi: 10.37391/ijeer.100205 – start-page: 1757 year: 2023 ident: r26 article-title: Audio assist: Enabling object detection through speech for the visually impaired doi: 10.1109/ICACRS58579.2023.10404820 – volume: 16 start-page: 502 issue: 3 year: 2024 ident: r10 article-title: Rupiah banknotes detection comparison of the faster R-CNN algorithm and YOLOv5 publication-title: J Infotel doi: 10.20895/infotel.v16i3.1189 – start-page: 1 year: 2023 ident: r13 article-title: Real-time object detection and audio feedback for the visually impaired doi: 10.1109/ASIANCON58793.2023.10269899 – volume: 8 start-page: 96 issue: 2 year: 2023 ident: r1 article-title: A comparative analysis of modern object detection algorithms: YOLO vs. SSD vs. faster R-CNN publication-title: Inf Technol Eng J doi: 10.24235/itej.v8i2.123 – year: 2023 ident: r21 article-title: Real-time object detection using an ensemble of one-stage and two-stage object detection models with dynamic fine-tuning using Kullback–Leibler divergence publication-title: Authorea doi: 10.22541/au.168788163.30701797/v1 – start-page: 318 year: 2021 ident: r25 article-title: Object recognition system for visually impaired people doi: 10.1109/DISCOVER52564.2021.9663608 – start-page: 779 year: 2016 ident: r23 article-title: You only look once: Unified, real-time object detection doi: 10.1109/CVPR.2016.91 – start-page: 1034 year: 2018 ident: r18 article-title: Hierarchical novelty detection for visual object recognition doi: 10.1109/CVPR.2018.00114 – start-page: 1 year: 2018 ident: r22 article-title: Multisensor-based object detection in indoor environment for visually impaired people doi: 10.1109/ICCONS.2018.8663016
SSID	ssj0003314075
Score	2.301953
Snippet	Advances in deep learning and computer vision have revolutionized object detection, enabling real-time and accurate object recognition. These object detection...
SourceID	crossref
SourceType	Index Database
Title	Improving Accessibility for the Visually Impaired with Enhanced Multi-class Object Detection Using YOLOv8L, YOLO11x, and Faster R-CNN with Diverse Backbones
Volume	4
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1fa9swEMBF1r3sZbRsY3-LHgZ7SLXZimzZj13TUkKbQGlHtxdzkmVc1iajS0qzz7LPsc-3k2Qr7sigGwFj5Oji-H7o7uTTiZC3poRcRjphSpYDJsBopowAjFrRFkrItHKVmI7H6eGZGJ0n573er07W0mKu3usfa9eV_I9WsQ31alfJ_oNmg1BswHPULx5Rw3i8l45XMwK7bt9Dn-m6DKmDny6-L-DycmlrAMNFyDTfn9b-vb9bfcu0daD7E2VnZHD8mRu_e7hPJvg8OZrcZEdupMbTOL5t8z0PwNZY6J-wvfHYyx26HA_T_wj6q7J7APzF8x02dX2d_9-ZTbPc2ZUlAH7kHxlTw4pe-zahxr_hrh3Pari6ApskUi9nMxUMR-1ndMP1LwuFfbqTGzwJ2VvteIwOEkMX1g_YxrXxVKaMp75-RjuIiw6rYp1pSGTsag6PhifM_1Iq-MoIti_-_7CNIWMRYyUnocD-he1f2P4PyEOO4QnvhPLWAxgMMGx1NZ7D_ft6oU7Ehzu30PGHOo7N6SZ53OiF7nq8tkjPTJ-QnwEtegctimhRRIu2aNEWLWoRoC1atIMW9WjRgBZ1aNEGrR3agLVDESvqsaIOKy-zwYoGrJ6Ss4P9071D1mzkwXScC7S4uYQ0kbriWVRChBF0CVJkXKVCZZku8VOJhCueKyPzrKpiqKLU6CSpMgCtBs_IxhTlPyfUmDyquFAYaOPD4wlI9Me5VJmpTFlK9YK8ax9n8c3XaynW6-7lvb_5ijxasfmabMyvF-YNuqJzte30_hvxE4tZ
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+Accessibility+for+the+Visually+Impaired+with+Enhanced+Multi-class+Object+Detection+Using+YOLOv8L%2C+YOLO11x%2C+and+Faster+R-CNN+with+Diverse+Backbones&rft.jtitle=Journal+of+Disability+Research&rft.au=Algaraady%2C+Jeehaan&rft.au=Albuhairy%2C+Mohammad+Mahyoob&rft.au=Khan%2C+Mohammad+Zubair&rft.date=2025-09-01&rft.issn=1658-9912&rft.eissn=2676-2633&rft.volume=4&rft.issue=4&rft_id=info:doi/10.57197%2FJDR-2025-0642&rft.externalDBID=n%2Fa&rft.externalDocID=10_57197_JDR_2025_0642
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1658-9912&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1658-9912&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1658-9912&client=summon