Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation

We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions. We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Medical Informatics Association : JAMIA Vol. 32; no. 4; pp. 616 - 625
Main Authors Li, Ying, Datta, Surabhi, Rastegar-Mojarad, Majid, Lee, Kyeryoung, Paek, Hunki, Glasgow, Julie, Liston, Chris, He, Long, Wang, Xiaoyan, Xu, Yingxin
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.04.2025
Subjects
Online AccessGet full text
ISSN1067-5027
1527-974X
1527-974X
DOI10.1093/jamia/ocaf030

Cover

Loading…
Abstract We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions. We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts. The system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93. Results showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics. The system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions.
AbstractList We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions.OBJECTIVESWe developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions.We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts.MATERIALS AND METHODSWe developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts.The system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93.RESULTSThe system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93.Results showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics.DISCUSSIONResults showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics.The system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions.CONCLUSIONThe system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions.
We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions. We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts. The system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's κ of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93. Results showed high sensitivity, Cohen's κ, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics. The system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions.
Author Glasgow, Julie
Datta, Surabhi
Li, Ying
Liston, Chris
Paek, Hunki
He, Long
Wang, Xiaoyan
Xu, Yingxin
Rastegar-Mojarad, Majid
Lee, Kyeryoung
Author_xml – sequence: 1
  givenname: Ying
  surname: Li
  fullname: Li, Ying
– sequence: 2
  givenname: Surabhi
  surname: Datta
  fullname: Datta, Surabhi
– sequence: 3
  givenname: Majid
  surname: Rastegar-Mojarad
  fullname: Rastegar-Mojarad, Majid
– sequence: 4
  givenname: Kyeryoung
  surname: Lee
  fullname: Lee, Kyeryoung
– sequence: 5
  givenname: Hunki
  orcidid: 0009-0000-9916-5654
  surname: Paek
  fullname: Paek, Hunki
– sequence: 6
  givenname: Julie
  surname: Glasgow
  fullname: Glasgow, Julie
– sequence: 7
  givenname: Chris
  surname: Liston
  fullname: Liston, Chris
– sequence: 8
  givenname: Long
  surname: He
  fullname: He, Long
– sequence: 9
  givenname: Xiaoyan
  surname: Wang
  fullname: Wang, Xiaoyan
– sequence: 10
  givenname: Yingxin
  surname: Xu
  fullname: Xu, Yingxin
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40036547$$D View this record in MEDLINE/PubMed
BookMark eNpVUU1vFSEUJabGfujSrWHpwrF3YIYRN8Y0VZs06aYm7ggPLu_RzMAIzDT9A_5uafts6obLvedwDnCOyUGIAQl528LHFiQ_vdGT16fRaAccXpCjtmdDI4fu10HdgxiaHthwSI5zvgFoBeP9K3LYAXDRd8MR-XMedjoYH7Y03-WCky7e0NEXTLosCWnC1eNtpre-7OgWw_3cr0h1Kt554_VIfSg4jr6CBj9TiyuOcZ4wlA9Uz_PoTT0RQ65dsHTG5GKaqidSXPW4PICvyUunx4xv9vWE_Px2fn32o7m8-n5x9vWyMbwTpa4taGkkY5IJ4Xhnod9YYRFsz6GiXNdO2AGEddx-4oIPrJOSSXDYuw0_IV8ededlM6E19ZJJj2pOftLpTkXt1f9I8Du1jatqGUAvOK8K7_cKKf5eMBc1-Wzq-3XAuGTF26EDKZgUlfruudmTy7_fr4TmkWBSzDmhe6K0oO7TVQ_pqn26_C9oyp5Q
Cites_doi 10.1136/bmjopen-2016-012545
10.1016/j.jval.2019.09.2142
10.1016/j.jval.2024.03.1395
10.1002/jrsm.1715
10.1080/19439342.2012.711342
10.3390/systems11070351
10.1016/j.zefq.2023.06.007
10.1371/journal.pbio.2005343
10.1016/j.jval.2020.08.041
10.1002/jrsm.1589
10.1111/all.16100
10.1002/jrsm.1553
10.11613/BM.2012.031
10.1371/journal.pone.0227742
10.1016/0895-4356(93)90018-V
10.1016/j.jval.2022.04.1277
10.7599/hmr.2015.35.1.44
10.1016/j.jclinepi.2020.01.005
10.21203/rs.3.rs-4426541/v1
10.1136/bmjopen-2023-076912
10.1007/s40273-022-01229-4
10.1016/j.dajour.2023.100162
10.1016/j.cola.2024.101287
10.1016/j.envint.2020.105623
10.2196/48996
10.1016/j.jclinepi.2021.12.005
10.1016/j.jval.2023.09.2044
10.1016/j.conctc.2019.100443
10.1186/s12874-024-02224-3
10.1186/2046-4053-4-6
10.1186/s41182-019-0165-6
10.1002/jrsm.1354
10.1016/j.jval.2023.03.1596
10.1186/2046-4053-4-5
ContentType Journal Article
Copyright The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.
The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. 2025
Copyright_xml – notice: The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.
– notice: The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. 2025
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOI 10.1093/jamia/ocaf030
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 1527-974X
EndPage 625
ExternalDocumentID PMC12005633
40036547
10_1093_jamia_ocaf030
Genre Journal Article
GroupedDBID ---
.DC
0R~
18M
2WC
4.4
48X
53G
5GY
5RE
5WD
6PF
7~T
AABZA
AACZT
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUAY
AAVAP
AAWTL
AAYXX
ABDFA
ABEJV
ABEUO
ABGNP
ABIXL
ABJNI
ABNHQ
ABOCM
ABPQP
ABPTD
ABQLI
ABQNK
ABVGC
ABWST
ABXVV
ACGFO
ACGFS
ACGOD
ACHQT
ACUFI
ACYHN
ADBBV
ADGZP
ADHKW
ADHZD
ADIPN
ADNBA
ADQBN
ADRTK
ADVEK
ADYVW
AEGPL
AEJOX
AEKSI
AEMDU
AEMQT
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFIYH
AFOFC
AFXAL
AGINJ
AGORE
AGQXC
AGSYK
AGUTN
AHGBF
AHMBA
AHMMS
AJBYB
AJEEA
AJNCP
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALUQC
ALXQX
APIBT
ATGXG
AVWKF
AXUDD
AYCSE
BAWUL
BAYMD
BCRHZ
BEYMZ
BHONS
BTRTY
BVRKM
C45
CDBKE
CITATION
CS3
DAKXR
DIK
DILTD
DU5
E3Z
EBS
ENERS
F5P
FDB
FECEO
FLUFQ
FOEOM
FOTVD
FQBLK
G-Q
GAUVT
GJXCC
GX1
H13
HAR
IH2
IHE
J21
JXSIZ
KOP
KSI
KSN
LSO
MHKGH
NOMLY
NOYVH
NQ-
O9-
OAUYM
OAWHX
OCZFY
ODMLO
OJQWA
OJZSN
OK1
OPAEJ
OVD
OWPYF
P2P
PAFKI
PEELM
Q5Y
ROX
ROZ
RPM
RPZ
RUSNO
RWL
RXO
TAE
TEORI
TJX
YAYTL
YKOAZ
YXANX
~S-
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c346t-c310a9c9229266f34d05bd6de0d530c313ad6d6d706df3d836372499290fe5fb3
ISSN 1067-5027
1527-974X
IngestDate Thu Aug 21 18:28:38 EDT 2025
Fri Jul 11 07:21:10 EDT 2025
Thu Jul 10 06:32:50 EDT 2025
Sun Jul 06 05:04:32 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords human-in-the loop AI
information extraction
GPT-4
large language model
systematic literature review
Language English
License https://creativecommons.org/licenses/by-nc/4.0
The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c346t-c310a9c9229266f34d05bd6de0d530c313ad6d6d706df3d836372499290fe5fb3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0009-0000-9916-5654
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC12005633
PMID 40036547
PQID 3174096296
PQPubID 23479
PageCount 10
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_12005633
proquest_miscellaneous_3174096296
pubmed_primary_40036547
crossref_primary_10_1093_jamia_ocaf030
PublicationCentury 2000
PublicationDate 2025-04-01
PublicationDateYYYYMMDD 2025-04-01
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-04-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Journal of the American Medical Informatics Association : JAMIA
PublicationTitleAlternate J Am Med Inform Assoc
PublicationYear 2025
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Du (2025041716422131700_ocaf030-B15) 2024; 24
Du (2025041716422131700_ocaf030-B21) 2024
2025041716422131700_ocaf030-B43
Syriani (2025041716422131700_ocaf030-B24) 2024; 80
Wright (2025041716422131700_ocaf030-B6) 2024; 27
Michelson (2025041716422131700_ocaf030-B3) 2019; 16
Polanin (2025041716422131700_ocaf030-B12) 2019; 10
Byrt (2025041716422131700_ocaf030-B29) 1993; 46
Chandler (2025041716422131700_ocaf030-B1) 2019
Guo (2025041716422131700_ocaf030-B22) 2024; 26
Brozek (2025041716422131700_ocaf030-B41) 2024; 79
Borah (2025041716422131700_ocaf030-B2) 2017; 7
Abogunrin (2025041716422131700_ocaf030-B8) 2020; 23
Made.AI (2025041716422131700_ocaf030-B36)
Kamra (2025041716422131700_ocaf030-B33) 2022; 25
Kebede (2025041716422131700_ocaf030-B18) 2023; 14
Thokala (2025041716422131700_ocaf030-B7) 2023; 41
Tawfik (2025041716422131700_ocaf030-B9) 2019; 47
Schmidt (2025041716422131700_ocaf030-B30) 2023; 181
LaserAI (2025041716422131700_ocaf030-B35)
O'Mara-Eves (2025041716422131700_ocaf030-B16) 2015; 4
Alshami (2025041716422131700_ocaf030-B23) 2023; 11
Gartlehner (2025041716422131700_ocaf030-B13) 2020; 121
EasySLR (2025041716422131700_ocaf030-B37)
Rathbone (2025041716422131700_ocaf030-B11) 2015; 4
Blaizot (2025041716422131700_ocaf030-B17) 2022; 13
Sauca (2025041716422131700_ocaf030-B31) 2023; 26
Ostawal (2025041716422131700_ocaf030-B42) 2019; 22
2025041716422131700_ocaf030-B5
2025041716422131700_ocaf030-B39
Howard (2025041716422131700_ocaf030-B32) 2020; 138
McHugh (2025041716422131700_ocaf030-B27) 2012; 22
Fiorini (2025041716422131700_ocaf030-B4) 2018; 16
RobertReviewer (2025041716422131700_ocaf030-B38)
Thomas (2025041716422131700_ocaf030-B34) 2022
Khalil (2025041716422131700_ocaf030-B19) 2022; 144
Park (2025041716422131700_ocaf030-B28) 2015; 35
Mallett (2025041716422131700_ocaf030-B14) 2012; 4
Wang (2025041716422131700_ocaf030-B10) 2020; 15
Moreno-Garcia (2025041716422131700_ocaf030-B20) 2023; 6
Hanegraaf (2025041716422131700_ocaf030-B26) 2024; 14
Borowiack (2025041716422131700_ocaf030-B40) 2023; 26
Khraisha (2025041716422131700_ocaf030-B25) 2024; 15
References_xml – volume: 7
  start-page: e012545
  year: 2017
  ident: 2025041716422131700_ocaf030-B2
  article-title: Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry
  publication-title: BMJ Open.
  doi: 10.1136/bmjopen-2016-012545
– volume: 22
  start-page: S802
  year: 2019
  ident: 2025041716422131700_ocaf030-B42
  article-title: PNS242 balancing global HTA requirements for literature reviews across Europe, North America, and Asia
  publication-title: Value Health
  doi: 10.1016/j.jval.2019.09.2142
– volume: 27
  start-page: S253
  year: 2024
  ident: 2025041716422131700_ocaf030-B6
  article-title: HTA44 systematic literature review requirements for health technology assessment in European markets
  publication-title: Value in Health
  doi: 10.1016/j.jval.2024.03.1395
– ident: 2025041716422131700_ocaf030-B35
– volume: 15
  start-page: 616
  year: 2024
  ident: 2025041716422131700_ocaf030-B25
  article-title: Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages
  publication-title: Res Synth Methods.
  doi: 10.1002/jrsm.1715
– volume: 4
  start-page: 445
  year: 2012
  ident: 2025041716422131700_ocaf030-B14
  article-title: The benefits and challenges of using systematic reviews in international development research
  publication-title: J Develop Effect
  doi: 10.1080/19439342.2012.711342
– volume: 11
  start-page: 351
  year: 2023
  ident: 2025041716422131700_ocaf030-B23
  article-title: Harnessing the power of ChatGPT for automating systematic review process: methodology, case study, limitations, and future directions
  publication-title: Systems
  doi: 10.3390/systems11070351
– ident: 2025041716422131700_ocaf030-B36
– year: 2022
  ident: 2025041716422131700_ocaf030-B34
– ident: 2025041716422131700_ocaf030-B37
– volume: 181
  start-page: 65
  year: 2023
  ident: 2025041716422131700_ocaf030-B30
  article-title: A narrative review of recent tools and innovations toward automating living systematic reviews and evidence syntheses
  publication-title: Z Evid Fortbild Qual Gesundhwes
  doi: 10.1016/j.zefq.2023.06.007
– volume: 16
  start-page: e2005343
  year: 2018
  ident: 2025041716422131700_ocaf030-B4
  article-title: Best match: new relevance search for PubMed
  publication-title: PLoS Biol.
  doi: 10.1371/journal.pbio.2005343
– ident: 2025041716422131700_ocaf030-B38
– volume: 23
  start-page: S404
  year: 2020
  ident: 2025041716422131700_ocaf030-B8
  article-title: ML1 do machines perform better than humans at systematic review of published literature? a case study of prostate cancer clinical evidence
  publication-title: Value Health
  doi: 10.1016/j.jval.2020.08.041
– volume: 14
  start-page: 156
  year: 2023
  ident: 2025041716422131700_ocaf030-B18
  article-title: In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic literature
  publication-title: Res Synth Methods.
  doi: 10.1002/jrsm.1589
– volume: 79
  start-page: 1812
  year: 2024
  ident: 2025041716422131700_ocaf030-B41
  article-title: Patients' values and preferences for health states in allergic rhinitis—an artificial intelligence supported systematic review
  publication-title: Allergy
  doi: 10.1111/all.16100
– volume: 13
  start-page: 353
  year: 2022
  ident: 2025041716422131700_ocaf030-B17
  article-title: Using artificial intelligence methods for systematic review in health sciences: a systematic review
  publication-title: Res Synth Methods.
  doi: 10.1002/jrsm.1553
– volume: 22
  start-page: 276
  year: 2012
  ident: 2025041716422131700_ocaf030-B27
  article-title: Interrater reliability: the kappa statistic
  publication-title: Biochem Med (Zagreb).
  doi: 10.11613/BM.2012.031
– volume: 15
  start-page: e0227742
  year: 2020
  ident: 2025041716422131700_ocaf030-B10
  article-title: Error rates of human reviewers during abstract screening in systematic reviews
  publication-title: PLoS One.
  doi: 10.1371/journal.pone.0227742
– volume: 46
  start-page: 423
  year: 1993
  ident: 2025041716422131700_ocaf030-B29
  article-title: Bias, prevalence and kappa
  publication-title: J Clin Epidemiol.
  doi: 10.1016/0895-4356(93)90018-V
– volume: 25
  start-page: S532
  year: 2022
  ident: 2025041716422131700_ocaf030-B33
  article-title: MSR70 pilot study to evaluate efficiency of DISTILLERSR®'S artificial intelligence (AI) tool over manual screening process in literature review
  publication-title: Value Health
  doi: 10.1016/j.jval.2022.04.1277
– ident: 2025041716422131700_ocaf030-B43
– volume: 35
  start-page: 44
  year: 2015
  ident: 2025041716422131700_ocaf030-B28
  article-title: Measurement of inter-rater reliability in systematic review
  publication-title: Hanyang Med Rev.
  doi: 10.7599/hmr.2015.35.1.44
– volume: 121
  start-page: 20
  year: 2020
  ident: 2025041716422131700_ocaf030-B13
  article-title: Single-reviewer abstract screening missed 13 percent of relevant studies: a crowd-based, randomized controlled trial
  publication-title: J Clin Epidemiol.
  doi: 10.1016/j.jclinepi.2020.01.005
– year: 2024
  ident: 2025041716422131700_ocaf030-B21
  doi: 10.21203/rs.3.rs-4426541/v1
– volume: 14
  start-page: e076912
  year: 2024
  ident: 2025041716422131700_ocaf030-B26
  article-title: Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: a mixed-methods review
  publication-title: BMJ Open.
  doi: 10.1136/bmjopen-2023-076912
– volume: 41
  start-page: 227
  year: 2023
  ident: 2025041716422131700_ocaf030-B7
  article-title: Living health technology assessment: issues, challenges and opportunities
  publication-title: Pharmacoeconomics.
  doi: 10.1007/s40273-022-01229-4
– volume: 6
  start-page: 100162
  year: 2023
  ident: 2025041716422131700_ocaf030-B20
  article-title: A novel application of machine learning and zero-shot classification methods for automated abstract screening in systematic reviews
  publication-title: Decision Anal J
  doi: 10.1016/j.dajour.2023.100162
– ident: 2025041716422131700_ocaf030-B39
– volume: 80
  start-page: 101287
  year: 2024
  ident: 2025041716422131700_ocaf030-B24
  article-title: Screening articles for systematic reviews with ChatGPT
  publication-title: J Comput Languages
  doi: 10.1016/j.cola.2024.101287
– volume: 138
  start-page: 105623
  year: 2020
  ident: 2025041716422131700_ocaf030-B32
  article-title: SWIFT-active screener: accelerated document screening through active learning and integrated recall estimation
  publication-title: Environ Int.
  doi: 10.1016/j.envint.2020.105623
– volume: 26
  start-page: e48996
  year: 2024
  ident: 2025041716422131700_ocaf030-B22
  article-title: Automated paper screening for clinical reviews using large language models: data analysis study
  publication-title: J Med Internet Res.
  doi: 10.2196/48996
– ident: 2025041716422131700_ocaf030-B5
– volume: 144
  start-page: 22
  year: 2022
  ident: 2025041716422131700_ocaf030-B19
  article-title: Tools to support the automation of systematic reviews: a scoping review
  publication-title: J Clin Epidemiol.
  doi: 10.1016/j.jclinepi.2021.12.005
– volume: 26
  start-page: S390
  year: 2023
  ident: 2025041716422131700_ocaf030-B31
  article-title: HTA361 living systematic review (LSR) in health technology assessment (HTA): current guidance, methods, and challenges
  publication-title: Value Health
  doi: 10.1016/j.jval.2023.09.2044
– volume-title: Cochrane Handbook for Systematic Reviews of Interventions
  year: 2019
  ident: 2025041716422131700_ocaf030-B1
– volume: 16
  start-page: 100443
  year: 2019
  ident: 2025041716422131700_ocaf030-B3
  article-title: The significant cost of systematic reviews and meta-analyses: a call for greater involvement of machine learning to assess the promise of clinical trials
  publication-title: Contemp Clin Trials Commun.
  doi: 10.1016/j.conctc.2019.100443
– volume: 24
  start-page: 108
  year: 2024
  ident: 2025041716422131700_ocaf030-B15
  article-title: Machine learning models for abstract screening task—a systematic literature review application for health economics and outcome research
  publication-title: BMC Med Res Methodol.
  doi: 10.1186/s12874-024-02224-3
– volume: 4
  start-page: 6
  year: 2015
  ident: 2025041716422131700_ocaf030-B11
  article-title: Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module
  publication-title: Syst Rev.
  doi: 10.1186/2046-4053-4-6
– volume: 47
  start-page: 46
  year: 2019
  ident: 2025041716422131700_ocaf030-B9
  article-title: A step by step guide for conducting a systematic review and meta-analysis with simulation data
  publication-title: Trop Med Health.
  doi: 10.1186/s41182-019-0165-6
– volume: 10
  start-page: 330
  year: 2019
  ident: 2025041716422131700_ocaf030-B12
  article-title: Best practice guidelines for abstract screening large-evidence systematic reviews and meta-analyses
  publication-title: Res Synthesis Methods
  doi: 10.1002/jrsm.1354
– volume: 26
  start-page: S288
  year: 2023
  ident: 2025041716422131700_ocaf030-B40
  article-title: MSR61 AI support reduced screening burden in a systematic review with costs and cost-effectiveness outcomes (SR-CCEO) for cost-effectiveness modeling
  publication-title: Value Health
  doi: 10.1016/j.jval.2023.03.1596
– volume: 4
  start-page: 5
  year: 2015
  ident: 2025041716422131700_ocaf030-B16
  article-title: Using text mining for study identification in systematic reviews: a systematic review of current approaches
  publication-title: Syst Rev.
  doi: 10.1186/2046-4053-4-5
SSID ssj0016235
Score 2.4694104
Snippet We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA)...
SourceID pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
StartPage 616
SubjectTerms Artificial Intelligence
Generative Artificial Intelligence
Humans
Information Storage and Retrieval - methods
Research and Applications
Systematic Reviews as Topic
Technology Assessment, Biomedical
Title Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation
URI https://www.ncbi.nlm.nih.gov/pubmed/40036547
https://www.proquest.com/docview/3174096296
https://pubmed.ncbi.nlm.nih.gov/PMC12005633
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nj9MwELXKIiEuiG_Kl4yEOBE2tWOn4YZQywLtcqCVyilyYnvbCtJVSQ_wAzjymxk7TuxSkBYuVuJEtpR5smfGb14QeioywpUYsCjTBAIUTXU01MLk4HUhC1aWtDQFztNTfjJP3i3Yotf7GbCWdnXxovz-x7qS_7Eq9IFdTZXsP1i2GxQ64BrsCy1YGNoL2XhULY1chskIeEHmz51QsqtLcQVsZ1Zg2jKFzFBOOWIVSHKa7ID0JCJL7AzOt1ui53lQa-DFwv_i5QaVK_5UyNVAWYXoAB82PQG7wNsOVRPLNfjUbq82o143_u7H3VYUy5U_p4IPcCa20XSzFtsGt1Oxbvn6HePoPYQY38wKF6Y7CAtYMm6FJmkEQdCi2cAO-w72hEYvay2-GPbxGDwEHbujoD317dMP-Xg-meSz0WJ2CV0mEHaYhf7NoqMMDcBVZFZ_103nNFthgmM7_LEbfN_HOQhcfuffBg7N7Dq65myEXzWwuoF6qrqJrkwd1-IW-tGhC3t0YY8u7NCFDbqwRxf26MIhul7iAFvPcYgsuKskDnCFPa5uo_l4NHt9Erm_dkQlTXgN7SAWWZkRAssA1zSRMSsklyqWjMbwlAq44zKNudRUDimnKYG4m2SxVkwX9A46qjaVuoewZkUqhxB0MF0mpYqLWCll1KQINEWc9tGz9jvn5404S96QKmhuDZI7g_TRk9YKOSyf5kxMVGqz-5qD-5xAFE8y3kd3G6t0QyVGrIklMM1wz17dC0aaff9JtVpaifaBSdZySu9fYOIH6KoH-kN0VG936hF4unXx2MLvFwK2uAQ
linkProvider Geneva Foundation for Medical Education and Research
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhancing+systematic+literature+reviews+with+generative+artificial+intelligence%3A+development%2C+applications%2C+and+performance+evaluation&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Li%2C+Ying&rft.au=Datta%2C+Surabhi&rft.au=Rastegar-Mojarad%2C+Majid&rft.au=Lee%2C+Kyeryoung&rft.date=2025-04-01&rft.issn=1527-974X&rft.eissn=1527-974X&rft_id=info:doi/10.1093%2Fjamia%2Focaf030&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1067-5027&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1067-5027&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1067-5027&client=summon