Accuracy of Artificial Intelligence-Based Virtual Assistants in Responding to Frequently Asked Questions Related to Orthognathic Surgery

Despite increasing interest in how conversational agents might improve health care delivery and information dissemination, there is limited research assessing the quality of health information provided by these technologies, especially in orthognathic surgery (OGS). This study aimed to measure and c...

Full description

Saved in:
Bibliographic Details
Published inJournal of oral and maxillofacial surgery Vol. 82; no. 8; pp. 916 - 921
Main Authors Fatima, Kaleem, Singh, Pinky, Amipara, Hetal, Chaudhary, Ganesh
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Despite increasing interest in how conversational agents might improve health care delivery and information dissemination, there is limited research assessing the quality of health information provided by these technologies, especially in orthognathic surgery (OGS). This study aimed to measure and compare the quality of four virtual assistants (VAs) in addressing the frequently asked questions about OGS. This in-silico cross-sectional study assessed the responses of a sample of four VAs through a standardized set of 10 questionnaires related to OGS. The independent variables were the four VAs. The four VAs tested were VA1: Alexa (Seattle, Washington), VA2: Google Assistant (Google Mountain View, California), VA3: Siri (Cupertino, California), and VA4: Bing (San Diego, California). The primary outcome variable was the quality of the answers generated by the four VAs. Four investigators (two orthodontists and two oral surgeons) assessed the quality of response of the four VAs through a standardized set of 10 questionnaires using a five-point modified Likert scale, with the lowest score (1) signifying the highest quality. The main outcome variables measured were the combined mean scores of the responses from each VA, and the secondary outcome assessed was the variability in responses among the different investigators. None. One-way analysis of variance was done to compare the average scores per question. One-way analysis of variance followed by Tukey's post hoc analyses was done to compare the combined mean scores among the VAs, and the combined mean scores of all questions were evaluated to determine variability if any among different VA's responses to the investigators. Among the four VAs, VA4 (1.32 ± 0.57) had significantly the lowest (best) score, followed by VA2 (1.55 ± 0.78), VA1 (2.67 ± 1.49), and VA3 (3.52 ± 0.50) (P value <.001). There were no significant differences in how the VAs: VA3 (P value = .46), VA4 (P value = .45), and VA2 (P value = .44) responded to each of the investigators except VA1 (P value = .003). The VAs responded to the queries related to OGS, with VA4 displaying the best quality response, followed by VA2, VA1, and VA3. Technology companies and clinical organizations should partner for an intelligent VA with evidence-based responses specifically curated to educate patients.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0278-2391
1531-5053
1531-5053
DOI:10.1016/j.joms.2024.04.013