Identification of Online Health Information Using Large Pretrained Language Models: Mixed Methods Study

Online health information is widely available, but a substantial portion of it is inaccurate or misleading, including exaggerated, incomplete, or unverified claims. Such misinformation can significantly influence public health decisions and pose serious challenges to health care systems. With advanc...

Full description

Saved in:

Bibliographic Details
Published in	Journal of medical Internet research Vol. 27; no. 5; p. e70733
Main Authors	Tan, Dongmei, Huang, Yi, Liu, Ming, Li, Ziyu, Wu, Xiaoqian, Huang, Cheng
Format	Journal Article
Language	English
Published	Canada Journal of Medical Internet Research 14.05.2025 Gunther Eysenbach MD MPH, Associate Professor JMIR Publications
Subjects	Ability Accuracy Adaptability Artificial Intelligence Authenticity Cancer Chatbots Conspiracy Consumer Health Information Content analysis COVID-19 vaccines Credibility Cultural factors Data collection Datasets Discrepancies Evaluation False information Food Health information Health services Healthy food Humans Identification Infants Information Interfaces Internet Language Large language models Machine learning Maternal and infant welfare Medical decision making Misconceptions Misinformation Natural Language Processing Nutrition Online health care information services Original Paper Pandemics Public health Reliability Social networks Text categorization Topics Verification China latent Dirichlet allocation online health information large pretrained language models ChatGPT performance evaluation text similarity analysis artificial intelligence text generation information identification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Online health information is widely available, but a substantial portion of it is inaccurate or misleading, including exaggerated, incomplete, or unverified claims. Such misinformation can significantly influence public health decisions and pose serious challenges to health care systems. With advances in artificial intelligence and natural language processing, pretrained large language models (LLMs) have shown promise in identifying and distinguishing misleading health information, although their effectiveness in this area remains underexplored. This study aimed to evaluate the performance of 4 mainstream LLMs (ChatGPT-3.5, ChatGPT-4, Ernie Bot, and iFLYTEK Spark) in the identification of online health information, providing empirical evidence for their practical application in this field. Web scraping was used to collect data from rumor-refuting websites, resulting in 2708 samples of online health information, including both true and false claims. The 4 LLMs' application programming interfaces were used for authenticity verification, with expert results as benchmarks. Model performance was evaluated using semantic similarity, accuracy, recall, F -score, content analysis, and credibility. This study found that the 4 models performed well in identifying online health information. Among them, ChatGPT-4 achieved the highest accuracy at 87.27%, followed by Ernie Bot at 87.25%, iFLYTEK Spark at 87%, and ChatGPT-3.5 at 81.82%. Furthermore, text length and semantic similarity analysis showed that Ernie Bot had the highest similarity to expert texts, whereas ChatGPT-4 showed good overall consistency in its explanations. In addition, the credibility assessment results indicated that ChatGPT-4 provided the most reliable evaluations. Further analysis suggested that the highest misjudgment probabilities with respect to the LLMs occurred within the topics of food and maternal-infant nutrition management and nutritional science and food controversies. Overall, the research suggests that LLMs have potential in online health information identification; however, their understanding of certain specialized health topics may require further improvement. The results demonstrate that, while these models show potential in providing assistance, their performance varies significantly in terms of accuracy, semantic understanding, and cultural adaptability. The principal findings highlight the models' ability to generate accessible and context-aware explanations; however, they fall short in areas requiring specialized medical knowledge or updated data, particularly for emerging health issues and context-sensitive scenarios. Significant discrepancies were observed in the models' ability to distinguish scientifically verified knowledge from popular misconceptions and in their stability when processing complex linguistic and cultural contexts. These challenges reveal the importance of refining training methodologies to improve the models' reliability and adaptability. Future research should focus on enhancing the models' capability to manage nuanced health topics and diverse cultural and linguistic nuances, thereby facilitating their broader adoption as reliable tools for online health information identification.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1438-8871 1439-4456 1438-8871
DOI:	10.2196/70733