Accuracy of Artificial Intelligence-Based Virtual Assistants in Responding to Frequently Asked Questions Related to Orthognathic Surgery
Despite increasing interest in how conversational agents might improve health care delivery and information dissemination, there is limited research assessing the quality of health information provided by these technologies, especially in orthognathic surgery (OGS). This study aimed to measure and c...
Saved in:
Published in | Journal of oral and maxillofacial surgery Vol. 82; no. 8; pp. 916 - 921 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Inc
01.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Despite increasing interest in how conversational agents might improve health care delivery and information dissemination, there is limited research assessing the quality of health information provided by these technologies, especially in orthognathic surgery (OGS).
This study aimed to measure and compare the quality of four virtual assistants (VAs) in addressing the frequently asked questions about OGS.
This in-silico cross-sectional study assessed the responses of a sample of four VAs through a standardized set of 10 questionnaires related to OGS.
The independent variables were the four VAs. The four VAs tested were VA1: Alexa (Seattle, Washington), VA2: Google Assistant (Google Mountain View, California), VA3: Siri (Cupertino, California), and VA4: Bing (San Diego, California).
The primary outcome variable was the quality of the answers generated by the four VAs. Four investigators (two orthodontists and two oral surgeons) assessed the quality of response of the four VAs through a standardized set of 10 questionnaires using a five-point modified Likert scale, with the lowest score (1) signifying the highest quality. The main outcome variables measured were the combined mean scores of the responses from each VA, and the secondary outcome assessed was the variability in responses among the different investigators.
None.
One-way analysis of variance was done to compare the average scores per question. One-way analysis of variance followed by Tukey's post hoc analyses was done to compare the combined mean scores among the VAs, and the combined mean scores of all questions were evaluated to determine variability if any among different VA's responses to the investigators.
Among the four VAs, VA4 (1.32 ± 0.57) had significantly the lowest (best) score, followed by VA2 (1.55 ± 0.78), VA1 (2.67 ± 1.49), and VA3 (3.52 ± 0.50) (P value <.001). There were no significant differences in how the VAs: VA3 (P value = .46), VA4 (P value = .45), and VA2 (P value = .44) responded to each of the investigators except VA1 (P value = .003).
The VAs responded to the queries related to OGS, with VA4 displaying the best quality response, followed by VA2, VA1, and VA3. Technology companies and clinical organizations should partner for an intelligent VA with evidence-based responses specifically curated to educate patients. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0278-2391 1531-5053 1531-5053 |
DOI: | 10.1016/j.joms.2024.04.013 |