Large Language Models in Biochemistry Education: Comparative Evaluation of Performance

Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pa...

Full description

Saved in:

Bibliographic Details
Published in	JMIR medical education Vol. 11; p. e67244
Main Authors	Bolgova, Olena, Shypilova, Inna, Mavrych, Volodymyr
Format	Journal Article
Language	English
Published	Canada JMIR Publications 10.04.2025
Subjects	Artificial Intelligence Biochemistry - education Chatbots and Conversational Agents e-Learning and Digital Medical Education Educational Measurement - methods Humans Large Language Models Machine Learning New Methods and Approaches in Medical Education New Resources for Medical Education Original Paper Students, Medical - statistics & numerical data Surveys and Questionnaires Testing and Assessment in Medical Education Theme Issue: ChatGPT and Generative Language Models in Medical Education United States United States questionnaire medical education natural language processing medical students AI Claude Copilot LLM large language model machine learning artificial intelligence NLP bioenergetics ChatGPT biochemistry comprehensive analysis medical course GPT-4 Gemini ML
Online Access	Get full text
ISSN	2369-3762 2369-3762
DOI	10.2196/67244

Cover

Abstract	Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation. The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course. We used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05. On average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04). Our study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment.
AbstractList	Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation. The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course. We used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05. On average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04). Our study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment. Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation.BackgroundRecent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation.The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course.ObjectiveThe objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course.We used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05.MethodsWe used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05.On average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04).ResultsOn average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04).Our study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment.ConclusionsOur study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment. Abstract BackgroundRecent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation. ObjectiveThe objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots—Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)—against the academic results of medical students in the medical biochemistry course. MethodsWe used 200 USMLE (United States Medical Licensing Examination)–style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4‐1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data’s basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P. ResultsOn average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students’ performance by 8.3% (P=PP ConclusionsOur study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment.
Author	Bolgova, Olena Shypilova, Inna Mavrych, Volodymyr
Author_xml	– sequence: 1 givenname: Olena orcidid: 0009-0002-9496-9754 surname: Bolgova fullname: Bolgova, Olena – sequence: 2 givenname: Inna orcidid: 0009-0000-0707-6997 surname: Shypilova fullname: Shypilova, Inna – sequence: 3 givenname: Volodymyr orcidid: 0009-0009-1159-4573 surname: Mavrych fullname: Mavrych, Volodymyr
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/40209205$$D View this record in MEDLINE/PubMed
BookMark	eNpVkV9P2zAUxS3ENFjXr4DyMomXbv4XJ9nLBFVhSJ3GA9urdWNfl6DE7uykEt8erwUETz4-9-h3LZ9P5NgHj4TMGf3KWaO-qYpLeUROuVDNQlSKH7_RJ2Se0gOllFWS07L5SE4k5bTJ-pT8XUPcYLEGv5kgi1_BYp-KzheXXTD3OHRpjI_Fyk4Gxi7478UyDFuI-bLDYrWDftr7RXDFLUYX4gDe4GfywUGfcP58zsifq9Xd8udi_fv6ZnmxXhjRqHEh64YZUNZZgLKqWsEBsWacZrPi3IpSIXctsw4kcxxLlFbSCgFt21inxIzcHLg2wIPexm6A-KgDdHpvhLjREMfO9KgRhVUoRcY4yQDasuUMauQ12v1gRn4cWNupHdAa9GOE_h30_cR393oTdjq_l5aK0kw4fybE8G_CNOr8fQb7HjyGKWnB6rpmpSzrHD17u-x1y0szOfDlEDAxpBTRvUYY1f9L1_vSxROGfp9Q
Cites_doi	10.3389/fmed.2023.1240915 10.2196/46482 10.7759/cureus.37023 10.1097/ACM.0000000000005626 10.2196/57594 10.24018/ejmed.2023.5.6.1989 10.2196/60807 10.1038/s41598-023-43436-9 10.1038/s41586-023-06455-0 10.1002/ca.24244 10.5694/mja2.52061 10.1371/journal.pdig.0000198 10.1016/j.ebiom.2019.07.019 10.1098/rsos.230658 10.7759/cureus.46222 10.3352/jeehp.2023.20.30 10.3390/healthcare11142046 10.7759/cureus.55991 10.1016/j.amjmed.2020.03.033 10.2147/AMEP.S457408 10.7759/cureus.42527 10.1007/s11596-021-2474-3 10.15406/mojap.2023.10.00339 10.1080/10872981.2023.2220920 10.1002/bmb.21808
ContentType	Journal Article
Copyright	Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org). Copyright © Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org) 2025
Copyright_xml	– notice: Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org). – notice: Copyright © Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org) 2025
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM DOA
DOI	10.2196/67244
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	MEDLINE MEDLINE - Academic
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
EISSN	2369-3762
EndPage	e67244
ExternalDocumentID	oai_doaj_org_article_ee3d6e43f2ef41aab5b21a8e28edd6e4 PMC12005600 40209205 10_2196_67244
Genre	Journal Article Comparative Study
GeographicLocations	United States
GeographicLocations_xml	– name: United States
GroupedDBID	7X7 8FI 8FJ AAFWJ AAHSB AAYXX ABUWG ADBBV AFKRA AFPKN ALIPV ALMA_UNASSIGNED_HOLDINGS AOIJS BCNDV BENPR CCPQU CITATION FYUFA GROUPED_DOAJ HMCUK HYE KQ8 M~E OK1 PGMZT PHGZM PHGZT PIMPY RPM UKHRP CGR CUY CVF ECM EIF NPM 7X8 PUEGO 5PM
ID	FETCH-LOGICAL-c396t-4891ca6dfdaa577b32aee8120ca6722d356e2fb1dfa41f2e5e4d407eaedb9df63
IEDL.DBID	DOA
ISSN	2369-3762
IngestDate	Wed Aug 27 01:23:28 EDT 2025 Thu Aug 21 18:29:26 EDT 2025 Fri Sep 05 17:40:10 EDT 2025 Sat May 24 01:34:04 EDT 2025 Tue Aug 05 12:08:58 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	questionnaire medical education natural language processing medical students AI Claude Copilot LLM large language model machine learning artificial intelligence NLP bioenergetics ChatGPT biochemistry comprehensive analysis medical course GPT-4 Gemini ML
Language	English
License	Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org). This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c396t-4891ca6dfdaa577b32aee8120ca6722d356e2fb1dfa41f2e5e4d407eaedb9df63
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 None declared.
ORCID	0009-0002-9496-9754 0009-0000-0707-6997 0009-0009-1159-4573
OpenAccessLink	https://doaj.org/article/ee3d6e43f2ef41aab5b21a8e28edd6e4
PMID	40209205
PQID	3188815458
PQPubID	23479
ParticipantIDs	doaj_primary_oai_doaj_org_article_ee3d6e43f2ef41aab5b21a8e28edd6e4 pubmedcentral_primary_oai_pubmedcentral_nih_gov_12005600 proquest_miscellaneous_3188815458 pubmed_primary_40209205 crossref_primary_10_2196_67244
PublicationCentury	2000
PublicationDate	20250410
PublicationDateYYYYMMDD	2025-04-10
PublicationDate_xml	– month: 4 year: 2025 text: 20250410 day: 10
PublicationDecade	2020
PublicationPlace	Canada
PublicationPlace_xml	– name: Canada – name: Toronto, Canada
PublicationTitle	JMIR medical education
PublicationTitleAlternate	JMIR Med Educ
PublicationYear	2025
Publisher	JMIR Publications
Publisher_xml	– name: JMIR Publications
References	Laupichler (R14); 99 Brin (R17); 13 Garcia-Vidal (R2); 46 Goyal (R19); 15 Singhal (R4); 620 Mavrych (R16); 10 Kung (R24); 2 Gilson (R21); 10 Kleinig (R26); 219 Bharatha (R18); 15 Ghosh (R29); 15 Mavrych (R13); 38 Abbas (R27); 16 Torres-Zegarra (R20); 20 Liu (R15); 26 Lai (R25); 10 Bolgova (R6); 5 Meo (R5); 11 R9 Agarwal (R8); 15 Surapaneni (R28); 52 R10 Lin (R23); 10 R12 R11 Roos (R7); 9 Friederichs (R22); 28 Ellahham (R3); 133 Liu (R1); 41
References_xml	– volume: 10 ident: R25 article-title: Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment publication-title: Front Med (Lausanne) doi: 10.3389/fmed.2023.1240915 – volume: 9 ident: R7 article-title: Artificial intelligence in medical education: comparative analysis of ChatGPT, Bing, and medical students in Germany publication-title: JMIR Med Educ doi: 10.2196/46482 – volume: 15 issue: 4 ident: R29 article-title: Evaluating ChatGPT’s ability to solve higher-order questions on the competency-based medical education curriculum in medical biochemistry publication-title: Cureus doi: 10.7759/cureus.37023 – volume: 99 start-page: 508 issue: 5 ident: R14 article-title: Large language models in medical education: comparing ChatGPT- to human-generated exam questions publication-title: Acad Med doi: 10.1097/ACM.0000000000005626 – volume: 10 ident: R21 article-title: Correction: How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment publication-title: JMIR Med Educ doi: 10.2196/57594 – volume: 5 start-page: 94 issue: 6 ident: R6 article-title: How well did ChatGPT perform in answering questions on different topics in gross anatomy? publication-title: Eur J Med Health Sci doi: 10.24018/ejmed.2023.5.6.1989 – volume: 26 ident: R15 article-title: Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and meta-analysis publication-title: J Med Internet Res doi: 10.2196/60807 – volume: 13 issue: 1 ident: R17 article-title: Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments publication-title: Sci Rep doi: 10.1038/s41598-023-43436-9 – ident: R10 – volume: 620 issue: 7973 ident: R4 article-title: Publisher correction: large language models encode clinical knowledge publication-title: Nature New Biol doi: 10.1038/s41586-023-06455-0 – volume: 38 start-page: 200 issue: 2 ident: R13 article-title: Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in gross anatomy course: comparative analysis publication-title: Clin Anat doi: 10.1002/ca.24244 – volume: 219 issue: 5 ident: R26 article-title: This too shall pass: the performance of ChatGPT-3.5, ChatGPT-4 and New Bing in an Australian medical licensing examination publication-title: Med J Aust doi: 10.5694/mja2.52061 – ident: R9 – ident: R12 – volume: 2 issue: 2 ident: R24 article-title: Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models publication-title: PLOS Digit Health doi: 10.1371/journal.pdig.0000198 – volume: 46 start-page: 27 issue: 27-29 ident: R2 article-title: Artificial intelligence to support clinical decision-making processes publication-title: EBioMedicine doi: 10.1016/j.ebiom.2019.07.019 – volume: 10 issue: 8 ident: R23 article-title: Why and how to embrace AI such as ChatGPT in your academic life publication-title: R Soc Open Sci doi: 10.1098/rsos.230658 – volume: 15 issue: 9 ident: R8 article-title: Evaluating ChatGPT-3.5 and Claude-2 in answering and explaining conceptual medical physiology multiple-choice questions publication-title: Cureus doi: 10.7759/cureus.46222 – volume: 20 ident: R20 article-title: Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study publication-title: J Educ Eval Health Prof doi: 10.3352/jeehp.2023.20.30 – volume: 11 issue: 14 ident: R5 article-title: ChatGPT knowledge evaluation in basic and clinical medical sciences: multiple choice question examination-based performance publication-title: Healthcare (Basel) doi: 10.3390/healthcare11142046 – volume: 16 issue: 3 ident: R27 article-title: Comparing the performance of popular large language models on the National Board of Medical Examiners sample questions publication-title: Cureus doi: 10.7759/cureus.55991 – volume: 133 start-page: 895 issue: 8 ident: R3 article-title: Artificial intelligence: the future for diabetes care publication-title: Am J Med doi: 10.1016/j.amjmed.2020.03.033 – volume: 15 start-page: 393 issue: 393-400 ident: R18 article-title: Comparing the performance of ChatGPT-4 and medical students on MCQs at varied levels of Bloom’s taxonomy publication-title: Adv Med Educ Pract doi: 10.2147/AMEP.S457408 – volume: 15 issue: 7 ident: R19 article-title: Interactive learning: online audience response system and multiple choice questions improve student participation in lectures publication-title: Cureus doi: 10.7759/cureus.42527 – volume: 41 start-page: 1105 issue: 6 ident: R1 article-title: Application of artificial intelligence in medicine: an overview publication-title: Curr Med Sci doi: 10.1007/s11596-021-2474-3 – volume: 10 start-page: 55 issue: 1 ident: R16 article-title: Evaluating AI performance in answering questions related to thoracic anatomy publication-title: MOJ Anat Physiol doi: 10.15406/mojap.2023.10.00339 – volume: 28 issue: 1 ident: R22 article-title: ChatGPT in medical school: how successful is AI in progress testing? publication-title: Med Educ Online doi: 10.1080/10872981.2023.2220920 – volume: 52 start-page: 237 issue: 2 ident: R28 article-title: Evaluating ChatGPT as a self-learning tool in medical biochemistry: a performance assessment in undergraduate medical university examination publication-title: Biochem Mol Biol Educ doi: 10.1002/bmb.21808 – ident: R11
SSID	ssj0001742059
Score	2.3076382
Snippet	Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields,... Abstract BackgroundRecent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation...
SourceID	doaj pubmedcentral proquest pubmed crossref
SourceType	Open Website Open Access Repository Aggregation Database Index Database
StartPage	e67244
SubjectTerms	Artificial Intelligence Biochemistry - education Chatbots and Conversational Agents e-Learning and Digital Medical Education Educational Measurement - methods Humans Large Language Models Machine Learning New Methods and Approaches in Medical Education New Resources for Medical Education Original Paper Students, Medical - statistics & numerical data Surveys and Questionnaires Testing and Assessment in Medical Education Theme Issue: ChatGPT and Generative Language Models in Medical Education United States
Title	Large Language Models in Biochemistry Education: Comparative Evaluation of Performance
URI	https://www.ncbi.nlm.nih.gov/pubmed/40209205 https://www.proquest.com/docview/3188815458 https://pubmed.ncbi.nlm.nih.gov/PMC12005600 https://doaj.org/article/ee3d6e43f2ef41aab5b21a8e28edd6e4
Volume	11
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3fSxwxEB7aK5SCiNJW1x9HCr4uZpNssuubJyciehxFy70tyWWCB2Wv6N3_72T3fkrBF1-ThwzfFzIzyeQbgDNjrCTXgynnXqbKu3FqDbdpEOQrSu5c0PHv8P1A3zyq21E-2mj1FWvCWnngFrhzROk1KhkEBpVZ63InMlugKNDHiXj68pJvJFPN7QplfBQ4fIWdWOtMu-xcG3JkW86n0ej_X2D5tj5yw-Fc78HuIlJkl62F-_AJ6-_w5y5WbrO7xS0ji63M_r6wSc16k9j7qm3exlZlGxfsai3vzforaW82DWy4_jPwAx6v-w9XN-miNUI6lqWepaoos7HVPnhrc2OcFBaRfDWnQSOEl7lGEVzmg1UZQZej8pS6oUXvSh-0_AmdelrjIbBMF4rSRKmCE2osbJkL4yzykBONXogEukvMqn-tAkZFmUMEtWpATaAXkVxNRsHqZoBorBY0Vu_RmMCvJQ8VQRVfLWyN0_lLRYdOUTTvewkctLyslorJb0lEJ1BsMbZly_ZMPXlqRLSzeJ1G0d7RR1h_DN9E7AscNSD5CXRmz3M8pWBl5rrw2YxMF770-oPh726zS18BFrbv9A
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large+Language+Models+in+Biochemistry+Education%3A+Comparative+Evaluation+of+Performance&rft.jtitle=JMIR+medical+education&rft.au=Bolgova%2C+Olena&rft.au=Shypilova%2C+Inna&rft.au=Mavrych%2C+Volodymyr&rft.date=2025-04-10&rft.pub=JMIR+Publications&rft.eissn=2369-3762&rft.volume=11&rft_id=info:doi/10.2196%2F67244&rft_id=info%3Apmid%2F40209205&rft.externalDocID=PMC12005600
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2369-3762&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2369-3762&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2369-3762&client=summon