Large Language Models in Biochemistry Education: Comparative Evaluation of Performance
Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pa...
Saved in:
Published in | JMIR medical education Vol. 11; p. e67244 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Canada
JMIR Publications
10.04.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2369-3762 2369-3762 |
DOI | 10.2196/67244 |
Cover
Abstract | Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation.
The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course.
We used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05.
On average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04).
Our study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment. |
---|---|
AbstractList | Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation.
The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course.
We used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05.
On average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04).
Our study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment. Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation.BackgroundRecent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation.The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course.ObjectiveThe objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots-Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)-against the academic results of medical students in the medical biochemistry course.We used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05.MethodsWe used 200 USMLE (United States Medical Licensing Examination)-style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4-1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data's basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P<.05.On average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04).ResultsOn average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students' performance by 8.3% (P=.02). In this study, Claude showed the best performance in biochemistry MCQs, correctly answering 92.5% (185/200) of questions, followed by GPT-4 (170/200, 85%), Gemini (157/200, 78.5%), and Copilot (128/200, 64%). The chatbots demonstrated the best results in the following 4 topics: eicosanoids (mean 100%, SD 0%), bioenergetics and electron transport chain (mean 96.4%, SD 7.2%), hexose monophosphate pathway (mean 91.7%, SD 16.7%), and ketone bodies (mean 93.8%, SD 12.5%). The Pearson chi-square test indicated a statistically significant association between the answers of all 4 chatbots (P<.001 to P<.04).Our study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment.ConclusionsOur study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment. Abstract BackgroundRecent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields, with medicine at the forefront of this technological revolution. Many studies indicated that at the current level of development, LLMs can pass different board exams. However, the ability to answer specific subject-related questions requires validation. ObjectiveThe objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots—Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft)—against the academic results of medical students in the medical biochemistry course. MethodsWe used 200 USMLE (United States Medical Licensing Examination)–style multiple-choice questions (MCQs) selected from the course exam database. They encompassed various complexity levels and were distributed across 23 distinctive topics. The questions with tables and images were not included in the study. The results of 5 successive attempts by Claude 3.5 Sonnet, GPT-4‐1106, Gemini 1.5 Flash, and Copilot to answer this questionnaire set were evaluated based on accuracy in August 2024. Statistica 13.5.0.17 (TIBCO Software Inc) was used to analyze the data’s basic statistics. Considering the binary nature of the data, the chi-square test was used to compare results among the different chatbots, with a statistical significance level of P. ResultsOn average, the selected chatbots correctly answered 81.1% (SD 12.8%) of the questions, surpassing the students’ performance by 8.3% (P=PP ConclusionsOur study suggests that different AI models may have unique strengths in specific medical fields, which could be leveraged for targeted support in biochemistry courses. This performance highlights the potential of AI in medical education and assessment. |
Author | Bolgova, Olena Shypilova, Inna Mavrych, Volodymyr |
Author_xml | – sequence: 1 givenname: Olena orcidid: 0009-0002-9496-9754 surname: Bolgova fullname: Bolgova, Olena – sequence: 2 givenname: Inna orcidid: 0009-0000-0707-6997 surname: Shypilova fullname: Shypilova, Inna – sequence: 3 givenname: Volodymyr orcidid: 0009-0009-1159-4573 surname: Mavrych fullname: Mavrych, Volodymyr |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40209205$$D View this record in MEDLINE/PubMed |
BookMark | eNpVkV9P2zAUxS3ENFjXr4DyMomXbv4XJ9nLBFVhSJ3GA9urdWNfl6DE7uykEt8erwUETz4-9-h3LZ9P5NgHj4TMGf3KWaO-qYpLeUROuVDNQlSKH7_RJ2Se0gOllFWS07L5SE4k5bTJ-pT8XUPcYLEGv5kgi1_BYp-KzheXXTD3OHRpjI_Fyk4Gxi7478UyDFuI-bLDYrWDftr7RXDFLUYX4gDe4GfywUGfcP58zsifq9Xd8udi_fv6ZnmxXhjRqHEh64YZUNZZgLKqWsEBsWacZrPi3IpSIXctsw4kcxxLlFbSCgFt21inxIzcHLg2wIPexm6A-KgDdHpvhLjREMfO9KgRhVUoRcY4yQDasuUMauQ12v1gRn4cWNupHdAa9GOE_h30_cR393oTdjq_l5aK0kw4fybE8G_CNOr8fQb7HjyGKWnB6rpmpSzrHD17u-x1y0szOfDlEDAxpBTRvUYY1f9L1_vSxROGfp9Q |
Cites_doi | 10.3389/fmed.2023.1240915 10.2196/46482 10.7759/cureus.37023 10.1097/ACM.0000000000005626 10.2196/57594 10.24018/ejmed.2023.5.6.1989 10.2196/60807 10.1038/s41598-023-43436-9 10.1038/s41586-023-06455-0 10.1002/ca.24244 10.5694/mja2.52061 10.1371/journal.pdig.0000198 10.1016/j.ebiom.2019.07.019 10.1098/rsos.230658 10.7759/cureus.46222 10.3352/jeehp.2023.20.30 10.3390/healthcare11142046 10.7759/cureus.55991 10.1016/j.amjmed.2020.03.033 10.2147/AMEP.S457408 10.7759/cureus.42527 10.1007/s11596-021-2474-3 10.15406/mojap.2023.10.00339 10.1080/10872981.2023.2220920 10.1002/bmb.21808 |
ContentType | Journal Article |
Copyright | Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org). Copyright © Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org) 2025 |
Copyright_xml | – notice: Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org). – notice: Copyright © Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org) 2025 |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM DOA |
DOI | 10.2196/67244 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2369-3762 |
EndPage | e67244 |
ExternalDocumentID | oai_doaj_org_article_ee3d6e43f2ef41aab5b21a8e28edd6e4 PMC12005600 40209205 10_2196_67244 |
Genre | Journal Article Comparative Study |
GeographicLocations | United States |
GeographicLocations_xml | – name: United States |
GroupedDBID | 7X7 8FI 8FJ AAFWJ AAHSB AAYXX ABUWG ADBBV AFKRA AFPKN ALIPV ALMA_UNASSIGNED_HOLDINGS AOIJS BCNDV BENPR CCPQU CITATION FYUFA GROUPED_DOAJ HMCUK HYE KQ8 M~E OK1 PGMZT PHGZM PHGZT PIMPY RPM UKHRP CGR CUY CVF ECM EIF NPM 7X8 PUEGO 5PM |
ID | FETCH-LOGICAL-c396t-4891ca6dfdaa577b32aee8120ca6722d356e2fb1dfa41f2e5e4d407eaedb9df63 |
IEDL.DBID | DOA |
ISSN | 2369-3762 |
IngestDate | Wed Aug 27 01:23:28 EDT 2025 Thu Aug 21 18:29:26 EDT 2025 Fri Sep 05 17:40:10 EDT 2025 Sat May 24 01:34:04 EDT 2025 Tue Aug 05 12:08:58 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | questionnaire medical education natural language processing medical students AI Claude Copilot LLM large language model machine learning artificial intelligence NLP bioenergetics ChatGPT biochemistry comprehensive analysis medical course GPT-4 Gemini ML |
Language | English |
License | Olena Bolgova, Inna Shypilova, Volodymyr Mavrych. Originally published in JMIR Medical Education (https://mededu.jmir.org). This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c396t-4891ca6dfdaa577b32aee8120ca6722d356e2fb1dfa41f2e5e4d407eaedb9df63 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 None declared. |
ORCID | 0009-0002-9496-9754 0009-0000-0707-6997 0009-0009-1159-4573 |
OpenAccessLink | https://doaj.org/article/ee3d6e43f2ef41aab5b21a8e28edd6e4 |
PMID | 40209205 |
PQID | 3188815458 |
PQPubID | 23479 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_ee3d6e43f2ef41aab5b21a8e28edd6e4 pubmedcentral_primary_oai_pubmedcentral_nih_gov_12005600 proquest_miscellaneous_3188815458 pubmed_primary_40209205 crossref_primary_10_2196_67244 |
PublicationCentury | 2000 |
PublicationDate | 20250410 |
PublicationDateYYYYMMDD | 2025-04-10 |
PublicationDate_xml | – month: 4 year: 2025 text: 20250410 day: 10 |
PublicationDecade | 2020 |
PublicationPlace | Canada |
PublicationPlace_xml | – name: Canada – name: Toronto, Canada |
PublicationTitle | JMIR medical education |
PublicationTitleAlternate | JMIR Med Educ |
PublicationYear | 2025 |
Publisher | JMIR Publications |
Publisher_xml | – name: JMIR Publications |
References | Laupichler (R14); 99 Brin (R17); 13 Garcia-Vidal (R2); 46 Goyal (R19); 15 Singhal (R4); 620 Mavrych (R16); 10 Kung (R24); 2 Gilson (R21); 10 Kleinig (R26); 219 Bharatha (R18); 15 Ghosh (R29); 15 Mavrych (R13); 38 Abbas (R27); 16 Torres-Zegarra (R20); 20 Liu (R15); 26 Lai (R25); 10 Bolgova (R6); 5 Meo (R5); 11 R9 Agarwal (R8); 15 Surapaneni (R28); 52 R10 Lin (R23); 10 R12 R11 Roos (R7); 9 Friederichs (R22); 28 Ellahham (R3); 133 Liu (R1); 41 |
References_xml | – volume: 10 ident: R25 article-title: Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment publication-title: Front Med (Lausanne) doi: 10.3389/fmed.2023.1240915 – volume: 9 ident: R7 article-title: Artificial intelligence in medical education: comparative analysis of ChatGPT, Bing, and medical students in Germany publication-title: JMIR Med Educ doi: 10.2196/46482 – volume: 15 issue: 4 ident: R29 article-title: Evaluating ChatGPT’s ability to solve higher-order questions on the competency-based medical education curriculum in medical biochemistry publication-title: Cureus doi: 10.7759/cureus.37023 – volume: 99 start-page: 508 issue: 5 ident: R14 article-title: Large language models in medical education: comparing ChatGPT- to human-generated exam questions publication-title: Acad Med doi: 10.1097/ACM.0000000000005626 – volume: 10 ident: R21 article-title: Correction: How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment publication-title: JMIR Med Educ doi: 10.2196/57594 – volume: 5 start-page: 94 issue: 6 ident: R6 article-title: How well did ChatGPT perform in answering questions on different topics in gross anatomy? publication-title: Eur J Med Health Sci doi: 10.24018/ejmed.2023.5.6.1989 – volume: 26 ident: R15 article-title: Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and meta-analysis publication-title: J Med Internet Res doi: 10.2196/60807 – volume: 13 issue: 1 ident: R17 article-title: Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments publication-title: Sci Rep doi: 10.1038/s41598-023-43436-9 – ident: R10 – volume: 620 issue: 7973 ident: R4 article-title: Publisher correction: large language models encode clinical knowledge publication-title: Nature New Biol doi: 10.1038/s41586-023-06455-0 – volume: 38 start-page: 200 issue: 2 ident: R13 article-title: Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in gross anatomy course: comparative analysis publication-title: Clin Anat doi: 10.1002/ca.24244 – volume: 219 issue: 5 ident: R26 article-title: This too shall pass: the performance of ChatGPT-3.5, ChatGPT-4 and New Bing in an Australian medical licensing examination publication-title: Med J Aust doi: 10.5694/mja2.52061 – ident: R9 – ident: R12 – volume: 2 issue: 2 ident: R24 article-title: Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models publication-title: PLOS Digit Health doi: 10.1371/journal.pdig.0000198 – volume: 46 start-page: 27 issue: 27-29 ident: R2 article-title: Artificial intelligence to support clinical decision-making processes publication-title: EBioMedicine doi: 10.1016/j.ebiom.2019.07.019 – volume: 10 issue: 8 ident: R23 article-title: Why and how to embrace AI such as ChatGPT in your academic life publication-title: R Soc Open Sci doi: 10.1098/rsos.230658 – volume: 15 issue: 9 ident: R8 article-title: Evaluating ChatGPT-3.5 and Claude-2 in answering and explaining conceptual medical physiology multiple-choice questions publication-title: Cureus doi: 10.7759/cureus.46222 – volume: 20 ident: R20 article-title: Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study publication-title: J Educ Eval Health Prof doi: 10.3352/jeehp.2023.20.30 – volume: 11 issue: 14 ident: R5 article-title: ChatGPT knowledge evaluation in basic and clinical medical sciences: multiple choice question examination-based performance publication-title: Healthcare (Basel) doi: 10.3390/healthcare11142046 – volume: 16 issue: 3 ident: R27 article-title: Comparing the performance of popular large language models on the National Board of Medical Examiners sample questions publication-title: Cureus doi: 10.7759/cureus.55991 – volume: 133 start-page: 895 issue: 8 ident: R3 article-title: Artificial intelligence: the future for diabetes care publication-title: Am J Med doi: 10.1016/j.amjmed.2020.03.033 – volume: 15 start-page: 393 issue: 393-400 ident: R18 article-title: Comparing the performance of ChatGPT-4 and medical students on MCQs at varied levels of Bloom’s taxonomy publication-title: Adv Med Educ Pract doi: 10.2147/AMEP.S457408 – volume: 15 issue: 7 ident: R19 article-title: Interactive learning: online audience response system and multiple choice questions improve student participation in lectures publication-title: Cureus doi: 10.7759/cureus.42527 – volume: 41 start-page: 1105 issue: 6 ident: R1 article-title: Application of artificial intelligence in medicine: an overview publication-title: Curr Med Sci doi: 10.1007/s11596-021-2474-3 – volume: 10 start-page: 55 issue: 1 ident: R16 article-title: Evaluating AI performance in answering questions related to thoracic anatomy publication-title: MOJ Anat Physiol doi: 10.15406/mojap.2023.10.00339 – volume: 28 issue: 1 ident: R22 article-title: ChatGPT in medical school: how successful is AI in progress testing? publication-title: Med Educ Online doi: 10.1080/10872981.2023.2220920 – volume: 52 start-page: 237 issue: 2 ident: R28 article-title: Evaluating ChatGPT as a self-learning tool in medical biochemistry: a performance assessment in undergraduate medical university examination publication-title: Biochem Mol Biol Educ doi: 10.1002/bmb.21808 – ident: R11 |
SSID | ssj0001742059 |
Score | 2.3076382 |
Snippet | Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation across various fields,... Abstract BackgroundRecent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have started a new era of innovation... |
SourceID | doaj pubmedcentral proquest pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database |
StartPage | e67244 |
SubjectTerms | Artificial Intelligence Biochemistry - education Chatbots and Conversational Agents e-Learning and Digital Medical Education Educational Measurement - methods Humans Large Language Models Machine Learning New Methods and Approaches in Medical Education New Resources for Medical Education Original Paper Students, Medical - statistics & numerical data Surveys and Questionnaires Testing and Assessment in Medical Education Theme Issue: ChatGPT and Generative Language Models in Medical Education United States |
Title | Large Language Models in Biochemistry Education: Comparative Evaluation of Performance |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40209205 https://www.proquest.com/docview/3188815458 https://pubmed.ncbi.nlm.nih.gov/PMC12005600 https://doaj.org/article/ee3d6e43f2ef41aab5b21a8e28edd6e4 |
Volume | 11 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3fSxwxEB7aK5SCiNJW1x9HCr4uZpNssuubJyciehxFy70tyWWCB2Wv6N3_72T3fkrBF1-ThwzfFzIzyeQbgDNjrCTXgynnXqbKu3FqDbdpEOQrSu5c0PHv8P1A3zyq21E-2mj1FWvCWnngFrhzROk1KhkEBpVZ63InMlugKNDHiXj68pJvJFPN7QplfBQ4fIWdWOtMu-xcG3JkW86n0ej_X2D5tj5yw-Fc78HuIlJkl62F-_AJ6-_w5y5WbrO7xS0ji63M_r6wSc16k9j7qm3exlZlGxfsai3vzforaW82DWy4_jPwAx6v-w9XN-miNUI6lqWepaoos7HVPnhrc2OcFBaRfDWnQSOEl7lGEVzmg1UZQZej8pS6oUXvSh-0_AmdelrjIbBMF4rSRKmCE2osbJkL4yzykBONXogEukvMqn-tAkZFmUMEtWpATaAXkVxNRsHqZoBorBY0Vu_RmMCvJQ8VQRVfLWyN0_lLRYdOUTTvewkctLyslorJb0lEJ1BsMbZly_ZMPXlqRLSzeJ1G0d7RR1h_DN9E7AscNSD5CXRmz3M8pWBl5rrw2YxMF770-oPh726zS18BFrbv9A |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large+Language+Models+in+Biochemistry+Education%3A+Comparative+Evaluation+of+Performance&rft.jtitle=JMIR+medical+education&rft.au=Bolgova%2C+Olena&rft.au=Shypilova%2C+Inna&rft.au=Mavrych%2C+Volodymyr&rft.date=2025-04-10&rft.pub=JMIR+Publications&rft.eissn=2369-3762&rft.volume=11&rft_id=info:doi/10.2196%2F67244&rft_id=info%3Apmid%2F40209205&rft.externalDocID=PMC12005600 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2369-3762&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2369-3762&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2369-3762&client=summon |