Automated Generation of Accessibility Test Reports from Recorded User Transcripts

Testing for accessibility is a significant step when developing software, as it ensures that all users, including those with disabilities, can effectively engage with web and mobile applications. While automated tools exist to detect accessibility issues in software, none are as comprehensive and ef...

Full description

Saved in:
Bibliographic Details
Published inProceedings / International Conference on Software Engineering pp. 204 - 216
Main Authors Huq, Syed Fatiul, Tafreshipour, Mahan, Kalcevich, Kate, Malek, Sam
Format Conference Proceeding
LanguageEnglish
Published IEEE 26.04.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Testing for accessibility is a significant step when developing software, as it ensures that all users, including those with disabilities, can effectively engage with web and mobile applications. While automated tools exist to detect accessibility issues in software, none are as comprehensive and effective as the process of user testing, where testers with various disabilities evaluate the application for accessibility and usability issues. However, user testing is not popular with software developers as it requires conducting lengthy interviews with users and later parsing through large recordings to derive the issues to fix. In this paper, we explore how large language models (LLMs) like GPT 4.0, which have shown promising results in context comprehension and semantic text generation, can mitigate this issue and streamline the user testing process. Our solution, called Reca11, takes in auto-generated transcripts from user testing video recordings and extracts the accessibility and usability issues mentioned by the tester. Our systematic prompt engineering determines the optimal configuration of input, instruction, context and demonstrations for best results. We evaluate Reca11's effectiveness on 36 user testing sessions across three applications. Based on the findings, we investigate the strengths and weaknesses of using LLMs in this space.
ISSN:1558-1225
DOI:10.1109/ICSE55347.2025.00043