Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation
We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions. We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup...
Saved in:
Published in | Journal of the American Medical Informatics Association : JAMIA Vol. 32; no. 4; pp. 616 - 625 |
---|---|
Main Authors | , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
01.04.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 1067-5027 1527-974X 1527-974X |
DOI | 10.1093/jamia/ocaf030 |
Cover
Loading…
Abstract | We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions.
We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts.
The system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93.
Results showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics.
The system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions. |
---|---|
AbstractList | We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions.OBJECTIVESWe developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions.We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts.MATERIALS AND METHODSWe developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts.The system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93.RESULTSThe system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93.Results showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics.DISCUSSIONResults showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics.The system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions.CONCLUSIONThe system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions. We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions. We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts. The system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93. Results showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics. The system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions. |
Author | Glasgow, Julie Datta, Surabhi Li, Ying Liston, Chris Paek, Hunki He, Long Wang, Xiaoyan Xu, Yingxin Rastegar-Mojarad, Majid Lee, Kyeryoung |
Author_xml | – sequence: 1 givenname: Ying surname: Li fullname: Li, Ying – sequence: 2 givenname: Surabhi surname: Datta fullname: Datta, Surabhi – sequence: 3 givenname: Majid surname: Rastegar-Mojarad fullname: Rastegar-Mojarad, Majid – sequence: 4 givenname: Kyeryoung surname: Lee fullname: Lee, Kyeryoung – sequence: 5 givenname: Hunki orcidid: 0009-0000-9916-5654 surname: Paek fullname: Paek, Hunki – sequence: 6 givenname: Julie surname: Glasgow fullname: Glasgow, Julie – sequence: 7 givenname: Chris surname: Liston fullname: Liston, Chris – sequence: 8 givenname: Long surname: He fullname: He, Long – sequence: 9 givenname: Xiaoyan surname: Wang fullname: Wang, Xiaoyan – sequence: 10 givenname: Yingxin surname: Xu fullname: Xu, Yingxin |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40036547$$D View this record in MEDLINE/PubMed |
BookMark | eNpVUU1vFSEUJabGfujSrWHpwrF3YIYRN8Y0VZs06aYm7ggPLu_RzMAIzDT9A_5uafts6obLvedwDnCOyUGIAQl528LHFiQ_vdGT16fRaAccXpCjtmdDI4fu10HdgxiaHthwSI5zvgFoBeP9K3LYAXDRd8MR-XMedjoYH7Y03-WCky7e0NEXTLosCWnC1eNtpre-7OgWw_3cr0h1Kt554_VIfSg4jr6CBj9TiyuOcZ4wlA9Uz_PoTT0RQ65dsHTG5GKaqidSXPW4PICvyUunx4xv9vWE_Px2fn32o7m8-n5x9vWyMbwTpa4taGkkY5IJ4Xhnod9YYRFsz6GiXNdO2AGEddx-4oIPrJOSSXDYuw0_IV8ededlM6E19ZJJj2pOftLpTkXt1f9I8Du1jatqGUAvOK8K7_cKKf5eMBc1-Wzq-3XAuGTF26EDKZgUlfruudmTy7_fr4TmkWBSzDmhe6K0oO7TVQ_pqn26_C9oyp5Q |
Cites_doi | 10.1136/bmjopen-2016-012545 10.1016/j.jval.2019.09.2142 10.1016/j.jval.2024.03.1395 10.1002/jrsm.1715 10.1080/19439342.2012.711342 10.3390/systems11070351 10.1016/j.zefq.2023.06.007 10.1371/journal.pbio.2005343 10.1016/j.jval.2020.08.041 10.1002/jrsm.1589 10.1111/all.16100 10.1002/jrsm.1553 10.11613/BM.2012.031 10.1371/journal.pone.0227742 10.1016/0895-4356(93)90018-V 10.1016/j.jval.2022.04.1277 10.7599/hmr.2015.35.1.44 10.1016/j.jclinepi.2020.01.005 10.21203/rs.3.rs-4426541/v1 10.1136/bmjopen-2023-076912 10.1007/s40273-022-01229-4 10.1016/j.dajour.2023.100162 10.1016/j.cola.2024.101287 10.1016/j.envint.2020.105623 10.2196/48996 10.1016/j.jclinepi.2021.12.005 10.1016/j.jval.2023.09.2044 10.1016/j.conctc.2019.100443 10.1186/s12874-024-02224-3 10.1186/2046-4053-4-6 10.1186/s41182-019-0165-6 10.1002/jrsm.1354 10.1016/j.jval.2023.03.1596 10.1186/2046-4053-4-5 |
ContentType | Journal Article |
Copyright | The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. 2025 |
Copyright_xml | – notice: The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. – notice: The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. 2025 |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM |
DOI | 10.1093/jamia/ocaf030 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 1527-974X |
EndPage | 625 |
ExternalDocumentID | PMC12005633 40036547 10_1093_jamia_ocaf030 |
Genre | Journal Article |
GroupedDBID | --- .DC 0R~ 18M 2WC 4.4 48X 53G 5GY 5RE 5WD 6PF 7~T AABZA AACZT AAMVS AAOGV AAPQZ AAPXW AARHZ AAUAY AAVAP AAWTL AAYXX ABDFA ABEJV ABEUO ABGNP ABIXL ABJNI ABNHQ ABOCM ABPQP ABPTD ABQLI ABQNK ABVGC ABWST ABXVV ACGFO ACGFS ACGOD ACHQT ACUFI ACYHN ADBBV ADGZP ADHKW ADHZD ADIPN ADNBA ADQBN ADRTK ADVEK ADYVW AEGPL AEJOX AEKSI AEMDU AEMQT AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFIYH AFOFC AFXAL AGINJ AGORE AGQXC AGSYK AGUTN AHGBF AHMBA AHMMS AJBYB AJEEA AJNCP ALIPV ALMA_UNASSIGNED_HOLDINGS ALUQC ALXQX APIBT ATGXG AVWKF AXUDD AYCSE BAWUL BAYMD BCRHZ BEYMZ BHONS BTRTY BVRKM C45 CDBKE CITATION CS3 DAKXR DIK DILTD DU5 E3Z EBS ENERS F5P FDB FECEO FLUFQ FOEOM FOTVD FQBLK G-Q GAUVT GJXCC GX1 H13 HAR IH2 IHE J21 JXSIZ KOP KSI KSN LSO MHKGH NOMLY NOYVH NQ- O9- OAUYM OAWHX OCZFY ODMLO OJQWA OJZSN OK1 OPAEJ OVD OWPYF P2P PAFKI PEELM Q5Y ROX ROZ RPM RPZ RUSNO RWL RXO TAE TEORI TJX YAYTL YKOAZ YXANX ~S- CGR CUY CVF ECM EIF NPM 7X8 5PM |
ID | FETCH-LOGICAL-c346t-c310a9c9229266f34d05bd6de0d530c313ad6d6d706df3d836372499290fe5fb3 |
ISSN | 1067-5027 1527-974X |
IngestDate | Thu Aug 21 18:28:38 EDT 2025 Fri Jul 11 07:21:10 EDT 2025 Thu Jul 10 06:32:50 EDT 2025 Sun Jul 06 05:04:32 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Keywords | human-in-the loop AI information extraction GPT-4 large language model systematic literature review |
Language | English |
License | https://creativecommons.org/licenses/by-nc/4.0 The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c346t-c310a9c9229266f34d05bd6de0d530c313ad6d6d706df3d836372499290fe5fb3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0009-0000-9916-5654 |
OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC12005633 |
PMID | 40036547 |
PQID | 3174096296 |
PQPubID | 23479 |
PageCount | 10 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_12005633 proquest_miscellaneous_3174096296 pubmed_primary_40036547 crossref_primary_10_1093_jamia_ocaf030 |
PublicationCentury | 2000 |
PublicationDate | 2025-04-01 |
PublicationDateYYYYMMDD | 2025-04-01 |
PublicationDate_xml | – month: 04 year: 2025 text: 2025-04-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Journal of the American Medical Informatics Association : JAMIA |
PublicationTitleAlternate | J Am Med Inform Assoc |
PublicationYear | 2025 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
References | Du (2025041716422131700_ocaf030-B15) 2024; 24 Du (2025041716422131700_ocaf030-B21) 2024 2025041716422131700_ocaf030-B43 Syriani (2025041716422131700_ocaf030-B24) 2024; 80 Wright (2025041716422131700_ocaf030-B6) 2024; 27 Michelson (2025041716422131700_ocaf030-B3) 2019; 16 Polanin (2025041716422131700_ocaf030-B12) 2019; 10 Byrt (2025041716422131700_ocaf030-B29) 1993; 46 Chandler (2025041716422131700_ocaf030-B1) 2019 Guo (2025041716422131700_ocaf030-B22) 2024; 26 Brozek (2025041716422131700_ocaf030-B41) 2024; 79 Borah (2025041716422131700_ocaf030-B2) 2017; 7 Abogunrin (2025041716422131700_ocaf030-B8) 2020; 23 Made.AI (2025041716422131700_ocaf030-B36) Kamra (2025041716422131700_ocaf030-B33) 2022; 25 Kebede (2025041716422131700_ocaf030-B18) 2023; 14 Thokala (2025041716422131700_ocaf030-B7) 2023; 41 Tawfik (2025041716422131700_ocaf030-B9) 2019; 47 Schmidt (2025041716422131700_ocaf030-B30) 2023; 181 LaserAI (2025041716422131700_ocaf030-B35) O'Mara-Eves (2025041716422131700_ocaf030-B16) 2015; 4 Alshami (2025041716422131700_ocaf030-B23) 2023; 11 Gartlehner (2025041716422131700_ocaf030-B13) 2020; 121 EasySLR (2025041716422131700_ocaf030-B37) Rathbone (2025041716422131700_ocaf030-B11) 2015; 4 Blaizot (2025041716422131700_ocaf030-B17) 2022; 13 Sauca (2025041716422131700_ocaf030-B31) 2023; 26 Ostawal (2025041716422131700_ocaf030-B42) 2019; 22 2025041716422131700_ocaf030-B5 2025041716422131700_ocaf030-B39 Howard (2025041716422131700_ocaf030-B32) 2020; 138 McHugh (2025041716422131700_ocaf030-B27) 2012; 22 Fiorini (2025041716422131700_ocaf030-B4) 2018; 16 RobertReviewer (2025041716422131700_ocaf030-B38) Thomas (2025041716422131700_ocaf030-B34) 2022 Khalil (2025041716422131700_ocaf030-B19) 2022; 144 Park (2025041716422131700_ocaf030-B28) 2015; 35 Mallett (2025041716422131700_ocaf030-B14) 2012; 4 Wang (2025041716422131700_ocaf030-B10) 2020; 15 Moreno-Garcia (2025041716422131700_ocaf030-B20) 2023; 6 Hanegraaf (2025041716422131700_ocaf030-B26) 2024; 14 Borowiack (2025041716422131700_ocaf030-B40) 2023; 26 Khraisha (2025041716422131700_ocaf030-B25) 2024; 15 |
References_xml | – volume: 7 start-page: e012545 year: 2017 ident: 2025041716422131700_ocaf030-B2 article-title: Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry publication-title: BMJ Open. doi: 10.1136/bmjopen-2016-012545 – volume: 22 start-page: S802 year: 2019 ident: 2025041716422131700_ocaf030-B42 article-title: PNS242 balancing global HTA requirements for literature reviews across Europe, North America, and Asia publication-title: Value Health doi: 10.1016/j.jval.2019.09.2142 – volume: 27 start-page: S253 year: 2024 ident: 2025041716422131700_ocaf030-B6 article-title: HTA44 systematic literature review requirements for health technology assessment in European markets publication-title: Value in Health doi: 10.1016/j.jval.2024.03.1395 – ident: 2025041716422131700_ocaf030-B35 – volume: 15 start-page: 616 year: 2024 ident: 2025041716422131700_ocaf030-B25 article-title: Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages publication-title: Res Synth Methods. doi: 10.1002/jrsm.1715 – volume: 4 start-page: 445 year: 2012 ident: 2025041716422131700_ocaf030-B14 article-title: The benefits and challenges of using systematic reviews in international development research publication-title: J Develop Effect doi: 10.1080/19439342.2012.711342 – volume: 11 start-page: 351 year: 2023 ident: 2025041716422131700_ocaf030-B23 article-title: Harnessing the power of ChatGPT for automating systematic review process: methodology, case study, limitations, and future directions publication-title: Systems doi: 10.3390/systems11070351 – ident: 2025041716422131700_ocaf030-B36 – year: 2022 ident: 2025041716422131700_ocaf030-B34 – ident: 2025041716422131700_ocaf030-B37 – volume: 181 start-page: 65 year: 2023 ident: 2025041716422131700_ocaf030-B30 article-title: A narrative review of recent tools and innovations toward automating living systematic reviews and evidence syntheses publication-title: Z Evid Fortbild Qual Gesundhwes doi: 10.1016/j.zefq.2023.06.007 – volume: 16 start-page: e2005343 year: 2018 ident: 2025041716422131700_ocaf030-B4 article-title: Best match: new relevance search for PubMed publication-title: PLoS Biol. doi: 10.1371/journal.pbio.2005343 – ident: 2025041716422131700_ocaf030-B38 – volume: 23 start-page: S404 year: 2020 ident: 2025041716422131700_ocaf030-B8 article-title: ML1 do machines perform better than humans at systematic review of published literature? a case study of prostate cancer clinical evidence publication-title: Value Health doi: 10.1016/j.jval.2020.08.041 – volume: 14 start-page: 156 year: 2023 ident: 2025041716422131700_ocaf030-B18 article-title: In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic literature publication-title: Res Synth Methods. doi: 10.1002/jrsm.1589 – volume: 79 start-page: 1812 year: 2024 ident: 2025041716422131700_ocaf030-B41 article-title: Patients' values and preferences for health states in allergic rhinitis—an artificial intelligence supported systematic review publication-title: Allergy doi: 10.1111/all.16100 – volume: 13 start-page: 353 year: 2022 ident: 2025041716422131700_ocaf030-B17 article-title: Using artificial intelligence methods for systematic review in health sciences: a systematic review publication-title: Res Synth Methods. doi: 10.1002/jrsm.1553 – volume: 22 start-page: 276 year: 2012 ident: 2025041716422131700_ocaf030-B27 article-title: Interrater reliability: the kappa statistic publication-title: Biochem Med (Zagreb). doi: 10.11613/BM.2012.031 – volume: 15 start-page: e0227742 year: 2020 ident: 2025041716422131700_ocaf030-B10 article-title: Error rates of human reviewers during abstract screening in systematic reviews publication-title: PLoS One. doi: 10.1371/journal.pone.0227742 – volume: 46 start-page: 423 year: 1993 ident: 2025041716422131700_ocaf030-B29 article-title: Bias, prevalence and kappa publication-title: J Clin Epidemiol. doi: 10.1016/0895-4356(93)90018-V – volume: 25 start-page: S532 year: 2022 ident: 2025041716422131700_ocaf030-B33 article-title: MSR70 pilot study to evaluate efficiency of DISTILLERSR®'S artificial intelligence (AI) tool over manual screening process in literature review publication-title: Value Health doi: 10.1016/j.jval.2022.04.1277 – ident: 2025041716422131700_ocaf030-B43 – volume: 35 start-page: 44 year: 2015 ident: 2025041716422131700_ocaf030-B28 article-title: Measurement of inter-rater reliability in systematic review publication-title: Hanyang Med Rev. doi: 10.7599/hmr.2015.35.1.44 – volume: 121 start-page: 20 year: 2020 ident: 2025041716422131700_ocaf030-B13 article-title: Single-reviewer abstract screening missed 13 percent of relevant studies: a crowd-based, randomized controlled trial publication-title: J Clin Epidemiol. doi: 10.1016/j.jclinepi.2020.01.005 – year: 2024 ident: 2025041716422131700_ocaf030-B21 doi: 10.21203/rs.3.rs-4426541/v1 – volume: 14 start-page: e076912 year: 2024 ident: 2025041716422131700_ocaf030-B26 article-title: Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: a mixed-methods review publication-title: BMJ Open. doi: 10.1136/bmjopen-2023-076912 – volume: 41 start-page: 227 year: 2023 ident: 2025041716422131700_ocaf030-B7 article-title: Living health technology assessment: issues, challenges and opportunities publication-title: Pharmacoeconomics. doi: 10.1007/s40273-022-01229-4 – volume: 6 start-page: 100162 year: 2023 ident: 2025041716422131700_ocaf030-B20 article-title: A novel application of machine learning and zero-shot classification methods for automated abstract screening in systematic reviews publication-title: Decision Anal J doi: 10.1016/j.dajour.2023.100162 – ident: 2025041716422131700_ocaf030-B39 – volume: 80 start-page: 101287 year: 2024 ident: 2025041716422131700_ocaf030-B24 article-title: Screening articles for systematic reviews with ChatGPT publication-title: J Comput Languages doi: 10.1016/j.cola.2024.101287 – volume: 138 start-page: 105623 year: 2020 ident: 2025041716422131700_ocaf030-B32 article-title: SWIFT-active screener: accelerated document screening through active learning and integrated recall estimation publication-title: Environ Int. doi: 10.1016/j.envint.2020.105623 – volume: 26 start-page: e48996 year: 2024 ident: 2025041716422131700_ocaf030-B22 article-title: Automated paper screening for clinical reviews using large language models: data analysis study publication-title: J Med Internet Res. doi: 10.2196/48996 – ident: 2025041716422131700_ocaf030-B5 – volume: 144 start-page: 22 year: 2022 ident: 2025041716422131700_ocaf030-B19 article-title: Tools to support the automation of systematic reviews: a scoping review publication-title: J Clin Epidemiol. doi: 10.1016/j.jclinepi.2021.12.005 – volume: 26 start-page: S390 year: 2023 ident: 2025041716422131700_ocaf030-B31 article-title: HTA361 living systematic review (LSR) in health technology assessment (HTA): current guidance, methods, and challenges publication-title: Value Health doi: 10.1016/j.jval.2023.09.2044 – volume-title: Cochrane Handbook for Systematic Reviews of Interventions year: 2019 ident: 2025041716422131700_ocaf030-B1 – volume: 16 start-page: 100443 year: 2019 ident: 2025041716422131700_ocaf030-B3 article-title: The significant cost of systematic reviews and meta-analyses: a call for greater involvement of machine learning to assess the promise of clinical trials publication-title: Contemp Clin Trials Commun. doi: 10.1016/j.conctc.2019.100443 – volume: 24 start-page: 108 year: 2024 ident: 2025041716422131700_ocaf030-B15 article-title: Machine learning models for abstract screening task—a systematic literature review application for health economics and outcome research publication-title: BMC Med Res Methodol. doi: 10.1186/s12874-024-02224-3 – volume: 4 start-page: 6 year: 2015 ident: 2025041716422131700_ocaf030-B11 article-title: Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module publication-title: Syst Rev. doi: 10.1186/2046-4053-4-6 – volume: 47 start-page: 46 year: 2019 ident: 2025041716422131700_ocaf030-B9 article-title: A step by step guide for conducting a systematic review and meta-analysis with simulation data publication-title: Trop Med Health. doi: 10.1186/s41182-019-0165-6 – volume: 10 start-page: 330 year: 2019 ident: 2025041716422131700_ocaf030-B12 article-title: Best practice guidelines for abstract screening large-evidence systematic reviews and meta-analyses publication-title: Res Synthesis Methods doi: 10.1002/jrsm.1354 – volume: 26 start-page: S288 year: 2023 ident: 2025041716422131700_ocaf030-B40 article-title: MSR61 AI support reduced screening burden in a systematic review with costs and cost-effectiveness outcomes (SR-CCEO) for cost-effectiveness modeling publication-title: Value Health doi: 10.1016/j.jval.2023.03.1596 – volume: 4 start-page: 5 year: 2015 ident: 2025041716422131700_ocaf030-B16 article-title: Using text mining for study identification in systematic reviews: a systematic review of current approaches publication-title: Syst Rev. doi: 10.1186/2046-4053-4-5 |
SSID | ssj0016235 |
Score | 2.4694104 |
Snippet | We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA)... |
SourceID | pubmedcentral proquest pubmed crossref |
SourceType | Open Access Repository Aggregation Database Index Database |
StartPage | 616 |
SubjectTerms | Artificial Intelligence Generative Artificial Intelligence Humans Information Storage and Retrieval - methods Research and Applications Systematic Reviews as Topic Technology Assessment, Biomedical |
Title | Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40036547 https://www.proquest.com/docview/3174096296 https://pubmed.ncbi.nlm.nih.gov/PMC12005633 |
Volume | 32 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nj9MwELXKIiEuiG_Kl4yEOBE2tWOn4YZQywLtcqCVyilyYnvbCtJVSQ_wAzjymxk7TuxSkBYuVuJEtpR5smfGb14QeioywpUYsCjTBAIUTXU01MLk4HUhC1aWtDQFztNTfjJP3i3Yotf7GbCWdnXxovz-x7qS_7Eq9IFdTZXsP1i2GxQ64BrsCy1YGNoL2XhULY1chskIeEHmz51QsqtLcQVsZ1Zg2jKFzFBOOWIVSHKa7ID0JCJL7AzOt1ui53lQa-DFwv_i5QaVK_5UyNVAWYXoAB82PQG7wNsOVRPLNfjUbq82o143_u7H3VYUy5U_p4IPcCa20XSzFtsGt1Oxbvn6HePoPYQY38wKF6Y7CAtYMm6FJmkEQdCi2cAO-w72hEYvay2-GPbxGDwEHbujoD317dMP-Xg-meSz0WJ2CV0mEHaYhf7NoqMMDcBVZFZ_103nNFthgmM7_LEbfN_HOQhcfuffBg7N7Dq65myEXzWwuoF6qrqJrkwd1-IW-tGhC3t0YY8u7NCFDbqwRxf26MIhul7iAFvPcYgsuKskDnCFPa5uo_l4NHt9Erm_dkQlTXgN7SAWWZkRAssA1zSRMSsklyqWjMbwlAq44zKNudRUDimnKYG4m2SxVkwX9A46qjaVuoewZkUqhxB0MF0mpYqLWCll1KQINEWc9tGz9jvn5404S96QKmhuDZI7g_TRk9YKOSyf5kxMVGqz-5qD-5xAFE8y3kd3G6t0QyVGrIklMM1wz17dC0aaff9JtVpaifaBSdZySu9fYOIH6KoH-kN0VG936hF4unXx2MLvFwK2uAQ |
linkProvider | Geneva Foundation for Medical Education and Research |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhancing+systematic+literature+reviews+with+generative+artificial+intelligence%3A+development%2C+applications%2C+and+performance+evaluation&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Li%2C+Ying&rft.au=Datta%2C+Surabhi&rft.au=Rastegar-Mojarad%2C+Majid&rft.au=Lee%2C+Kyeryoung&rft.date=2025-04-01&rft.issn=1527-974X&rft.eissn=1527-974X&rft_id=info:doi/10.1093%2Fjamia%2Focaf030&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1067-5027&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1067-5027&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1067-5027&client=summon |