Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics
Rigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot explain how GMAI m...
Saved in:
Published in | Journal of medical Internet research Vol. 27; no. 7956; p. e70901 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Canada
Journal of Medical Internet Research
26.05.2025
Gunther Eysenbach MD MPH, Associate Professor JMIR Publications |
Subjects | |
Online Access | Get full text |
ISSN | 1438-8871 1439-4456 1438-8871 |
DOI | 10.2196/70901 |
Cover
Loading…
Abstract | Rigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot explain how GMAI might fail (lacking explanatory power) or in what circumstances (lacking predictive power). To address these limitations, we propose a new methodology to improve the quality of GMAI evaluation using construct-oriented processes. Drawing on modern psychometric techniques, we introduce approaches to construct identification and present alternative assessment formats for different domains of professional skills, knowledge, and behaviors that are essential for safe practice. We also discuss the need for human oversight in future GMAI adoption. |
---|---|
AbstractList | Rigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot explain how GMAI might fail (lacking explanatory power) or in what circumstances (lacking predictive power). To address these limitations, we propose a new methodology to improve the quality of GMAI evaluation using construct-oriented processes. Drawing on modern psychometric techniques, we introduce approaches to construct identification and present alternative assessment formats for different domains of professional skills, knowledge, and behaviors that are essential for safe practice. We also discuss the need for human oversight in future GMAI adoption.UnlabelledRigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot explain how GMAI might fail (lacking explanatory power) or in what circumstances (lacking predictive power). To address these limitations, we propose a new methodology to improve the quality of GMAI evaluation using construct-oriented processes. Drawing on modern psychometric techniques, we introduce approaches to construct identification and present alternative assessment formats for different domains of professional skills, knowledge, and behaviors that are essential for safe practice. We also discuss the need for human oversight in future GMAI adoption. Rigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot explain how GMAI might fail (lacking explanatory power) or in what circumstances (lacking predictive power). To address these limitations, we propose a new methodology to improve the quality of GMAI evaluation using construct-oriented processes. Drawing on modern psychometric techniques, we introduce approaches to construct identification and present alternative assessment formats for different domains of professional skills, knowledge, and behaviors that are essential for safe practice. We also discuss the need for human oversight in future GMAI adoption. AbstractRigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot explain how GMAI might fail (lacking explanatory power) or in what circumstances (lacking predictive power). To address these limitations, we propose a new methodology to improve the quality of GMAI evaluation using construct-oriented processes. Drawing on modern psychometric techniques, we introduce approaches to construct identification and present alternative assessment formats for different domains of professional skills, knowledge, and behaviors that are essential for safe practice. We also discuss the need for human oversight in future GMAI adoption. |
Audience | Academic |
Author | Jiang, Liming Sun, Luning Wang, Xiting Hernández-Orallo, José Luo, Fang Gibbons, Christopher Xie, Xing Stillwell, David |
Author_xml | – sequence: 1 givenname: Luning orcidid: 0000-0002-2470-4278 surname: Sun fullname: Sun, Luning – sequence: 2 givenname: Christopher orcidid: 0000-0002-4732-7305 surname: Gibbons fullname: Gibbons, Christopher – sequence: 3 givenname: José orcidid: 0000-0001-9746-7632 surname: Hernández-Orallo fullname: Hernández-Orallo, José – sequence: 4 givenname: Xiting orcidid: 0000-0001-5768-1095 surname: Wang fullname: Wang, Xiting – sequence: 5 givenname: Liming orcidid: 0000-0001-6464-2326 surname: Jiang fullname: Jiang, Liming – sequence: 6 givenname: David orcidid: 0000-0003-0174-3212 surname: Stillwell fullname: Stillwell, David – sequence: 7 givenname: Fang orcidid: 0000-0003-3281-9574 surname: Luo fullname: Luo, Fang – sequence: 8 givenname: Xing orcidid: 0009-0009-3257-3077 surname: Xie fullname: Xie, Xing |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40418851$$D View this record in MEDLINE/PubMed |
BookMark | eNptkk1v1DAQhiNURD_oX0CREBKXLf5K7HBB26qUlYrgAOLAwXLsSdZL1m7tpNL-e2a7pXQR8sGj8TOv5x3NcXEQYoCiOKXkjNGmfidJQ-iz4ogKrmZKSXrwJD4sjnNeEcKIaOiL4lAQQZWq6FHx8xw2MbjyHIJdrk36ld-Xl3dmmMzoQ19eQYBkBp_H8jM4b81QztPoO289hoswwjD4Hmuh_OHHZfk1b-wyrmFM3uaXxfPODBlOH-6T4vvHy28Xn2bXX64WF_PrmcVuxplwVV21jghTG2ppq0AqZaRjHbXGtdIQ1QjeKC6dI7IWDmqmrDTIMQYd8JNisdN10az0TfLoY6Oj8fo-EVOvDTZtB9BtVzlZc0kcKMFbq4htGyZZa6qWckFR68NO62Zq1-AshBH974nuvwS_1H2805RRhm1uFd4-KKR4O0Ee9dpni3MyAeKUNWdblFBWI_r6H3QVpxRwVkgxxuumYvwv1Rt04EMX8WO7FdVzJWhNhWAVUmf_ofA4WHuLu9J5zO8VvHrq9NHin91A4M0OsCnmnKB7RCjR253T9zvHfwM_EsXn |
Cites_doi | 10.1111/1475-6773.14016 10.1038/s41586-023-05881-4 10.1146/annurev.clinpsy.032408.153553 10.4324/9780203803912 10.1038/s41746-024-01344-w 10.1017/9781316594179 10.2196/52597 10.1016/S2589-7500(23)00048-1 10.4324/9781315787527 10.1111/nyas.15007 10.1038/s41746-024-01258-7 10.2196/53616 10.1038/s41746-024-01208-3 10.1037/1040-3590.4.1.26 10.1038/s41598-021-02481-y 10.2196/48633 10.4324/9781410605269 10.1038/s41586-023-06291-2 10.1038/s41746-024-01083-y 10.1016/j.artint.2018.09.004 |
ContentType | Journal Article |
Copyright | Luning Sun, Christopher Gibbons, José Hernández-Orallo, Xiting Wang, Liming Jiang, David Stillwell, Fang Luo, Xing Xie. Originally published in the Journal of Medical Internet Research (https://www.jmir.org). COPYRIGHT 2025 Journal of Medical Internet Research 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. Copyright © Luning Sun, Christopher Gibbons, José Hernández-Orallo, Xiting Wang, Liming Jiang, David Stillwell, Fang Luo, Xing Xie. Originally published in the Journal of Medical Internet Research (https://www.jmir.org) 2025 |
Copyright_xml | – notice: Luning Sun, Christopher Gibbons, José Hernández-Orallo, Xiting Wang, Liming Jiang, David Stillwell, Fang Luo, Xing Xie. Originally published in the Journal of Medical Internet Research (https://www.jmir.org). – notice: COPYRIGHT 2025 Journal of Medical Internet Research – notice: 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: Copyright © Luning Sun, Christopher Gibbons, José Hernández-Orallo, Xiting Wang, Liming Jiang, David Stillwell, Fang Luo, Xing Xie. Originally published in the Journal of Medical Internet Research (https://www.jmir.org) 2025 |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7QJ 7RV 7X7 7XB 8FI 8FJ 8FK ABUWG AFKRA ALSLI AZQEC BENPR CCPQU CNYFK COVID DWQXO E3H F2A FYUFA GHDGH K9. KB0 M0S M1O NAPCQ PHGZM PHGZT PIMPY PKEHL PPXIY PQEST PQQKQ PQUKI PRINS PRQQA 7X8 5PM DOA |
DOI | 10.2196/70901 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Applied Social Sciences Index & Abstracts (ASSIA) Nursing & Allied Health Database ProQuest Health & Medical Collection ProQuest Central (purchase pre-March 2016) ProQuest Hospital Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Social Science Premium Collection ProQuest Central Essentials ProQuest Central ProQuest One Library & Information Science Collection Coronavirus Research Database ProQuest Central Library & Information Sciences Abstracts (LISA) Library & Information Science Abstracts (LISA) Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Health & Medical Complete (Alumni) Nursing & Allied Health Database (Alumni Edition) ProQuest Health & Medical Collection Library Science Database Nursing & Allied Health Premium ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest One Social Sciences MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database ProQuest One Academic Middle East (New) Library and Information Science Abstracts (LISA) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing Applied Social Sciences Index and Abstracts (ASSIA) ProQuest Central China ProQuest Central ProQuest Library Science Health Research Premium Collection Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Library & Information Science Collection ProQuest Central (New) Social Science Premium Collection ProQuest One Social Sciences ProQuest One Academic Eastern Edition Coronavirus Research Database ProQuest Nursing & Allied Health Source ProQuest Hospital Collection Health Research Premium Collection (Alumni) ProQuest Hospital Collection (Alumni) Nursing & Allied Health Premium ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Nursing & Allied Health Source (Alumni) ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic Publicly Available Content Database CrossRef MEDLINE |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Library & Information Science |
EISSN | 1438-8871 |
EndPage | e70901 |
ExternalDocumentID | oai_doaj_org_article_bf5d76370de843bc80cb9272ba5b1341 PMC12129431 A841614425 40418851 10_2196_70901 |
Genre | Journal Article |
GeographicLocations | United States United Kingdom--UK |
GeographicLocations_xml | – name: United States – name: United Kingdom--UK |
GroupedDBID | --- .4I .DC 29L 2WC 36B 53G 5GY 5VS 77K 7RV 7X7 8FI 8FJ AAFWJ AAKPC AAWTL AAYXX ABDBF ABIVO ABUWG ACGFO ADBBV AEGXH AENEX AFKRA AFPKN AIAGR ALIPV ALMA_UNASSIGNED_HOLDINGS ALSLI AOIJS BAWUL BCNDV BENPR CCPQU CITATION CNYFK CS3 DIK DU5 DWQXO E3Z EAP EBD EBS EJD ELW EMB EMOBN ESX F5P FRP FYUFA GROUPED_DOAJ GX1 HMCUK HYE IAO ICO IEA IHR INH ISN ITC KQ8 M1O M48 NAPCQ OK1 OVT P2P PGMZT PHGZM PHGZT PIMPY PQQKQ RNS RPM SJN SV3 TR2 UKHRP XSB ACUHS CGR CUY CVF ECM EIF NPM PPXIY PRQQA PMFND 3V. 7QJ 7XB 8FK AZQEC COVID E3H F2A K9. PKEHL PQEST PQUKI PRINS 7X8 5PM PUEGO |
ID | FETCH-LOGICAL-c491t-4d565bd04a6a1c1b8e788a7d2f1cadb7a089439837dd0764de628c7a8e722efe3 |
IEDL.DBID | DOA |
ISSN | 1438-8871 1439-4456 |
IngestDate | Wed Aug 27 01:20:59 EDT 2025 Thu Aug 21 18:24:37 EDT 2025 Fri Jul 11 17:11:40 EDT 2025 Fri Jul 25 09:19:20 EDT 2025 Tue Jun 17 21:55:23 EDT 2025 Tue Jun 10 21:07:54 EDT 2025 Mon Jul 21 05:31:04 EDT 2025 Sun Jul 06 05:05:11 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 7956 |
Keywords | generalist medical artificial intelligence human oversight predictive power construct-oriented evaluation data contamination explanatory power psychometrics benchmark health care |
Language | English |
License | Luning Sun, Christopher Gibbons, José Hernández-Orallo, Xiting Wang, Liming Jiang, David Stillwell, Fang Luo, Xing Xie. Originally published in the Journal of Medical Internet Research (https://www.jmir.org). This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c491t-4d565bd04a6a1c1b8e788a7d2f1cadb7a089439837dd0764de628c7a8e722efe3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 CG is an employee of Oracle Health Inc., serves on the Board of Directors at the International Society for Quality of Life Research, and holds stock in Oracle Corporation. XW has previously been employed at Microsoft Research and holds stock in Microsoft. LJ has previously served as an intern at Microsoft Research. XX is an employee of Microsoft Research and holds stock in Microsoft. All other authors declare no conflicts of interest. these authors contributed equally |
ORCID | 0000-0002-2470-4278 0009-0009-3257-3077 0000-0002-4732-7305 0000-0001-5768-1095 0000-0003-3281-9574 0000-0003-0174-3212 0000-0001-6464-2326 0000-0001-9746-7632 |
OpenAccessLink | https://doaj.org/article/bf5d76370de843bc80cb9272ba5b1341 |
PMID | 40418851 |
PQID | 3222369523 |
PQPubID | 2033121 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_bf5d76370de843bc80cb9272ba5b1341 pubmedcentral_primary_oai_pubmedcentral_nih_gov_12129431 proquest_miscellaneous_3212120126 proquest_journals_3222369523 gale_infotracmisc_A841614425 gale_infotracacademiconefile_A841614425 pubmed_primary_40418851 crossref_primary_10_2196_70901 |
PublicationCentury | 2000 |
PublicationDate | 2025-05-26 |
PublicationDateYYYYMMDD | 2025-05-26 |
PublicationDate_xml | – month: 05 year: 2025 text: 2025-05-26 day: 26 |
PublicationDecade | 2020 |
PublicationPlace | Canada |
PublicationPlace_xml | – name: Canada – name: Toronto – name: Toronto, Canada |
PublicationTitle | Journal of medical Internet research |
PublicationTitleAlternate | J Med Internet Res |
PublicationYear | 2025 |
Publisher | Journal of Medical Internet Research Gunther Eysenbach MD MPH, Associate Professor JMIR Publications |
Publisher_xml | – name: Journal of Medical Internet Research – name: Gunther Eysenbach MD MPH, Associate Professor – name: JMIR Publications |
References | Moor (R1); 616 Tam (R4); 7 Chustecki (R29); 13 Duckworth (R8); 11 Bommasani (R19); 1525 Ali (R2); 5 R23 Singhal (R5); 620 R24 R27 Reise (R25); 5 Goldberg (R17); 4 Sorin (R21); 26 Nembhard (R20); 58 R3 R6 R7 Mehandru (R22); 7 R9 R10 R12 R14 R13 R16 R18 Riedemann (R15); 7 Hassan (R28); 11 Kaczmarczyk (R11) Martínez-Plumed (R26); 271 |
References_xml | – ident: R13 – volume: 58 start-page: 250 issue: 2 ident: R20 article-title: A systematic review of research on empathy in health care publication-title: Health Serv Res doi: 10.1111/1475-6773.14016 – volume: 616 start-page: 259 issue: 7956 ident: R1 article-title: Foundation models for generalist medical artificial intelligence publication-title: Nature New Biol doi: 10.1038/s41586-023-05881-4 – volume: 5 ident: R25 article-title: Item response theory and clinical measurement publication-title: Annu Rev Clin Psychol doi: 10.1146/annurev.clinpsy.032408.153553 – ident: R27 doi: 10.4324/9780203803912 – ident: R23 – volume: 7 issue: 1 ident: R15 article-title: The path forward for large language models in medicine is open publication-title: NPJ Digit Med doi: 10.1038/s41746-024-01344-w – ident: R7 doi: 10.1017/9781316594179 – volume: 26 ident: R21 article-title: Large language models and empathy: systematic review publication-title: J Med Internet Res doi: 10.2196/52597 – ident: R3 – volume: 5 start-page: e179 issue: 4 ident: R2 article-title: Using ChatGPT to write patient clinic letters publication-title: Lancet Digit Health doi: 10.1016/S2589-7500(23)00048-1 – ident: R10 – ident: R16 doi: 10.4324/9781315787527 – volume: 1525 start-page: 140 issue: 1 ident: R19 article-title: Holistic evaluation of language models publication-title: Ann N Y Acad Sci doi: 10.1111/nyas.15007 – volume: 7 issue: 1 ident: R4 article-title: A framework for human evaluation of large language models in healthcare derived from literature review publication-title: NPJ Digit Med doi: 10.1038/s41746-024-01258-7 – ident: R18 – ident: R9 – ident: R14 – ident: R12 – volume: 13 ident: R29 article-title: Benefits and risks of AI in health care: narrative review publication-title: Interact J Med Res doi: 10.2196/53616 – ident: R11 article-title: Evaluating multimodal AI in medical diagnostics publication-title: NPJ Digit Med doi: 10.1038/s41746-024-01208-3 – volume: 4 start-page: 26 issue: 1 ident: R17 article-title: The development of markers for the big-five factor structure publication-title: Psychol Assess doi: 10.1037/1040-3590.4.1.26 – volume: 11 issue: 1 ident: R8 article-title: Using explainable machine learning to characterise data drift and detect emergent health risks for emergency department admissions during COVID-19 publication-title: Sci Rep doi: 10.1038/s41598-021-02481-y – volume: 11 ident: R28 article-title: Barriers to and facilitators of artificial intelligence adoption in health care: scoping review publication-title: JMIR Hum Factors doi: 10.2196/48633 – ident: R24 doi: 10.4324/9781410605269 – volume: 620 start-page: 172 issue: 7972 ident: R5 article-title: Large language models encode clinical knowledge publication-title: Nature New Biol doi: 10.1038/s41586-023-06291-2 – volume: 7 issue: 1 ident: R22 article-title: Evaluating large language models as agents in the clinic publication-title: NPJ Digit Med doi: 10.1038/s41746-024-01083-y – ident: R6 – volume: 271 ident: R26 article-title: Item response theory in AI: Analysing machine learning classifiers at the instance level publication-title: Artif Intell doi: 10.1016/j.artint.2018.09.004 |
SSID | ssj0020491 |
Score | 2.4393966 |
SecondaryResourceType | review_article |
Snippet | Rigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care.... AbstractRigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in... |
SourceID | doaj pubmedcentral proquest gale pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database |
StartPage | e70901 |
SubjectTerms | Accuracy Applications of AI Artificial Intelligence Benchmarking Benchmarks Chatbots Cognition & reasoning Cognitive ability Development and Evaluation of Research Methods, Instruments and Tools Empathy Health care Humans Large language models Licenses Licensing examinations Medical practices Methods Personality Physicians Power Program Evaluation and Review Technique Psychometrics Quantitative psychology Subject specialists Verbal communication Viewpoint Viewpoints and s Work skills |
SummonAdditionalLinks | – databaseName: Library Science Database dbid: M1O link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9QwDLdgSBMS4mN8FbYpSIg9dZfk2qbHC9rQpoE0eGHaJB6ifHU3TeuNu94Lfz12mx5XkHhA91anvaR2HLu2fwZ462RhlKhM6tA4SKmWMS2l4qkNXAQ3zquizc05_VKcnGWfL_KL-MFtEdMqe53YKmo_c_SNfEQRgXExQb_pw-2PlLpGUXQ1ttC4C_cEHfWUuie-rhwutH7FJjygdGcUtJHik9j7pT9_Wpj-v5Xx2mk0zJRcO3qOH4HuJ91lnFzvLxu7737-gef4_6t6DA-jVcoOOjF6AndCvQU7saaBvWOxaImYyKI22ILN0xiXfwrfu0IYdoiU6Y2ZXy_es6OII15fsohtjRLFYmCo_asOvIJ9WkMFZedXzZR1WvmGmn25xTM4Oz769vEkjW0bUofvu0kzj0ai9TwzhRFO2DKgm22Ul5VwxltlOGG-T9Az9p6rIvOhkKVTBsdJGaowfg4b9awOL4Epm3vSSSUaivhsb71HA4TqiaVwyN0Edntm6tsOnUOjV0Pc1i23EzgkFq-IBKbdXpjNL3Xcm9pWuUc1q7gPZTa2ruTOTqSS1uSW8O4S2CMB0bTlUQqciZULOEcCz9IHFLpFx1TmCWwPRuJWdUNyLwU6qoqF_i0CCbxZkelOSn-rw2xJYwT-0JYoEnjRSeRqSRnPRIl2cwLlQFYHax5S6qtpCyROD0VWiFf_ntdruC-p6zGnfMht2Gjmy7CDplhjd9v99guCrTXq priority: 102 providerName: ProQuest |
Title | Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40418851 https://www.proquest.com/docview/3222369523 https://www.proquest.com/docview/3212120126 https://pubmed.ncbi.nlm.nih.gov/PMC12129431 https://doaj.org/article/bf5d76370de843bc80cb9272ba5b1341 |
Volume | 27 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3dSxtBEB_aFKRQSrWtXqthC6V9Otzd3N1e-mYkokJsKZUGfFj264yIZzHJ_9-Zu03I4YMvEshDZnMf89vZmWFnfgvw1cnCKFGZ1GFwkFIvY1pKxVMbuAhukFdFU5szuShOL7PzaT7dOOqLasJaeuBWcYe2yj3agOI-lNnAupI7O5RKWpNbIiOj1Rd93iqZiqkWxr1iC95QoTNOsUPFh_HUl5XnaQj6Hy_DG36oWyO54XRO3sHbGC2yo_Ypt-FFqHfgIPYasG8sNhORclm00h3YmsT98vdw1TaosBFKZnfm4Xb-g40jv3d9zSLnNCLN4oZNc6uWVIKdbbB1sr83ixlrV8s7OoTLzT_A5cn4z_FpGo9TSB1qY5FmHoM363lmCiOcsGXA9NcoLyvhjLfKcOJiH2LG6j1XReZDIUunDI6TMlRh8BF69X0d9oApm3taK0oM4PDa3nqPgQH1-UrhUPcJ9Feq1v9a1gyN2QZhoRssEhgRAGshkVw3PyD0OkKvn4I-ge8EnyZTRIyciR0F-IxEaqWPaEsVE0aZJ7DfGYkm5Lri1QTQ0YTnmragBsUQE_UEvqzF9E8qS6vD_ZLGCPygjy8S2G3ny_qVMp6JEuPZBMrOTOq8c1dS38wagm-6KEIhPj2Hlj7Da0lnFnOqZtyH3uJhGQ4wkFrYPrxUU9WHV6Pxxa_f_caC8Hsifv4HAVQe-Q |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1bT9RAFD5BTJDEeEHFKuCYeHkqdGZ7NTEGFLIrLL5AJPGhzq0sIXRxLzH6o_wr_iXPaafrVhPfeDD7tjM722m_c-uc8x2AZ1rEMuGF9DU6Bz7VMvqpSAJf2YBb3YmKuMrN6R_G3ePw_Ul0sgA_mloYSqtsdGKlqM1Q0zvyLToR6MQZxk1vLr_41DWKTlebFho1LPbtt68Yso1f997h830uxN7u0duu77oK-DrM-MQPDfowygShjCXXXKUWo0CZGFFwLY1KZECU5BkGbsZgkB8aG4tUJxLnCWEL28F1r8F1dCNSUgR9_mEW4KG3zZfgJqVXI7C3kiBzvWYae1e1Bfhb-c9Zv3Zm5pyp27sNP5ubVGe4nG9OJ2pTf_-DP_L_uYt34Jbzutl2LSZ3YcGWK7DuajbYC-aKsgikzGm7FVjqu7yDe_CpLvRhOzgyuJCj8_Ertut40stT5ri7UWKYO_iq_qom52C9OdZT9vFsMmC11bmgZmZ6fB-Or2TrD2CxHJb2IbBERYZ0boqOMK5tlDHoYFG9tOAa0eTBRgOe_LJmH8kxaiN05RW6PNghSM0GiSy8-mI4Os2d7slVERk0I0lgbBp2lE4DrTKRCCUjRXx-HrwkQOak0hB1WrrKDLxGIgfLt-loGgNvEXmw1pqJqki3hxvU5U4VjvPfkPPg6WyYfknpfaUdTmkOxw_6SrEHq7UEzLYUBihpGBd4kLZko7Xn9kh5NqiI0mlRfBT80b-v6wnc6B71D_KD3uH-Y1gW1OE5oNzPNVicjKZ2Hd3OidqoZJ3B56sWi1_qb5PS |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3batRA9FArLIKI1lu0rSN4eQqbmU0yiSDS2i5da4sPFhf6EOeWbpFm6-4W8df8Os9JJtsNgm9l33Ims8mce84N4JURqZK8VKFB4yCkWsYwEzIKtYu4M4OkTOvcnKPj9OAk_jROxmvwp62FobTKVibWgtpODX0j71NEYJDm6Df1S58W8WVv-OHyZ0gTpCjS2o7TaEjk0P3-he7b_P1oD3H9Wojh_tePB6GfMBCaOOeLMLZoz2gbxSpV3HCdOfQIlbSi5EZZLVVE7clzdOKsRYc_ti4VmZEK1wnhSjfAfW_BbTlAtYm8JMfXzh5a3rwHdynVGom8L6Pcz51pdV89IuBfRbCiCbtZmitqb3gf7nl7le00BPYA1ly1AVu-2oG9Yb6cidDLvJzYgN6Rj9g_hNOmRIbtImRyoWY_5u_Yvu8wXp0x3_UaaY35kFH9V01bCzZa6RfKvp0vJqyR1xc0BszMH8HJjRz3Y1ivppV7CkzqxJK0ytCExL2tthZNE6o0Ftzg2Qew3R51cdn07SjQ3yFcFDUuAtglBCyB1Ga7vjCdnRWeawtdJhYFsIysy-KBNllkdC6k0CrR1AkvgLeEvoKEAeLIKF_TgM9IbbWKHQrqossqkgA2OyuRiU0X3BJA4YXIvLgm-QBeLsF0JyXGVW56RWs4_tDKSAN40tDL8pXiKOYZWtQBZB1K6rxzF1KdT-oW47QpooI_-_9zvYAeMmHxeXR8-BzuCBqNHFHS5CasL2ZXbgvttYXerhmDwfeb5sS_F5tUsw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Beyond+Benchmarks%3A+Evaluating+Generalist+Medical+Artificial+Intelligence+With+Psychometrics&rft.jtitle=Journal+of+medical+Internet+research&rft.au=Sun%2C+Luning&rft.au=Gibbons%2C+Christopher&rft.au=Hern%C3%A1ndez-Orallo%2C+Jos%C3%A9&rft.au=Wang%2C+Xiting&rft.date=2025-05-26&rft.pub=Journal+of+Medical+Internet+Research&rft.issn=1439-4456&rft.volume=27&rft.issue=7956&rft_id=info:doi/10.2196%2F70901&rft.externalDocID=A841614425 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1438-8871&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1438-8871&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1438-8871&client=summon |