Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness

Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is called “uninformativeness”. Some high-risk indicators of uninformativeness can be identified at the stage of drafting the protocol, however t...

Full description

Saved in:
Bibliographic Details
Published inGates open research Vol. 7; p. 56
Main Authors Wood, Thomas A, McNair, Douglas
Format Journal Article
LanguageEnglish
Published 2023
Online AccessGet full text
ISSN2572-4754
2572-4754
DOI10.12688/gatesopenres.14416.1

Cover

Abstract Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is called “uninformativeness”. Some high-risk indicators of uninformativeness can be identified at the stage of drafting the protocol, however the necessary information can be hard to find in unstructured text documents. Methods : We have developed a browser-based tool which uses natural language processing to identify and quantify the risk of uninformativeness. The tool reads and parses the text of trial protocols and identifies key features of the trial design, which are fed into a risk model. The application runs in a browser and features a graphical user interface that allows a user to drag and drop the PDF of the trial protocol and visualize the risk indicators and their locations in the text. The user can correct inaccuracies in the tool’s parsing of the text. The tool outputs a PDF report listing the key features extracted. The tool is focused HIV and tuberculosis trials but could be extended to more pathologies in future. Results: On a manually tagged dataset of 300 protocols, the tool was able to identify the condition of a trial with 100% area under curve (AUC), presence or absence of statistical analysis plan with 87% AUC, presence or absence of effect estimate with 95% AUC, number of subjects with 69% accuracy, and simulation with 98% AUC. On a dataset of 11,925 protocols downloaded from ClinicalTrials.gov, the tool was able to identify trial phase with 75% accuracy, number of arms with 58% accuracy, and the countries of investigation with 87% AUC. Conclusion : We have developed and validated a natural language processing tool for identifying and quantifying risks of uninformativeness in clinical trial protocols. The software is open-source and can be accessed at the following link: https://app.clinicaltrialrisk.org
AbstractList Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is called “uninformativeness”. Some high-risk indicators of uninformativeness can be identified at the stage of drafting the protocol, however the necessary information can be hard to find in unstructured text documents. Methods : We have developed a browser-based tool which uses natural language processing to identify and quantify the risk of uninformativeness. The tool reads and parses the text of trial protocols and identifies key features of the trial design, which are fed into a risk model. The application runs in a browser and features a graphical user interface that allows a user to drag and drop the PDF of the trial protocol and visualize the risk indicators and their locations in the text. The user can correct inaccuracies in the tool’s parsing of the text. The tool outputs a PDF report listing the key features extracted. The tool is focused HIV and tuberculosis trials but could be extended to more pathologies in future. Results: On a manually tagged dataset of 300 protocols, the tool was able to identify the condition of a trial with 100% area under curve (AUC), presence or absence of statistical analysis plan with 87% AUC, presence or absence of effect estimate with 95% AUC, number of subjects with 69% accuracy, and simulation with 98% AUC. On a dataset of 11,925 protocols downloaded from ClinicalTrials.gov, the tool was able to identify trial phase with 75% accuracy, number of arms with 58% accuracy, and the countries of investigation with 87% AUC. Conclusion : We have developed and validated a natural language processing tool for identifying and quantifying risks of uninformativeness in clinical trial protocols. The software is open-source and can be accessed at the following link: https://app.clinicaltrialrisk.org
Author Wood, Thomas A
McNair, Douglas
Author_xml – sequence: 1
  givenname: Thomas A
  orcidid: 0000-0001-8962-8571
  surname: Wood
  fullname: Wood, Thomas A
– sequence: 2
  givenname: Douglas
  surname: McNair
  fullname: McNair, Douglas
BookMark eNpNkMtOwzAQRS1UJErpJyD5B1r8ip2yQxUvqRISyj5ynXEwpHZkO6Du-HRCYNHNzEj3zlmcSzTzwQNC15SsKZNledPqDCn04COkNRWCyjU9Q3NWKLYSqhCzk_sCLVN6J4QwwjdSiTn63nbOO6M7XEU3zleXPnAVQneLU7D5S0fAuu-7sZJd8HhIzrfY6zzEsd1p3w66BdzHYCBNWQ7YNeCzs0ec3wDHX2KwOE_8wTtvQzyMtE_w48sVOre6S7D83wtUPdxX26fV7uXxeXu3W5kNpytJ96JUXOhC0aLYc1s2lktQmm1AGSWZMawBQ0thmZUlN2MGBKhlhDRGKr5AxR_WxJBSBFv30R10PNaU1JPI-lRkPYmsKf8BueJwHg
Cites_doi 10.1213/00000539-195301000-00041
10.1016/j.jbi.2023.104321
10.1007/s43441-020-00236-x
10.1093/jnci/djs180
10.1007/s40264-017-0558-6
10.2139/ssrn.3974887
10.7554/eLife.79491
10.3390/app10062157
10.1007/s12247-020-09449-x
10.1002/pst.175
10.48550/arXiv.2010.02559
10.1109/ICTer58063.2022.10024089
10.1177/1740774509338429
10.2196/29238
10.1007/978-3-031-15342-6_1
10.1136/ejhpharm-2017-001282
10.1177/1740774513478229
10.1177/009286151104500213
10.1001/jamainternmed.2016.6008
10.1016/j.conctc.2018.08.001
10.1371/journal.pone.0033677
10.1093/biostatistics/kxx069
10.1136/bmj.h809
10.1080/00031305.2019.1603479
10.1002/pst.2040
10.1007/s43441-022-00438-5
10.1001/jama.2019.9892
10.1089/gen.38.01.05
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.12688/gatesopenres.14416.1
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2572-4754
ExternalDocumentID 10_12688_gatesopenres_14416_1
GroupedDBID 7X7
8FI
8FJ
AAFWJ
AAYXX
ABUWG
ADBBV
AFKRA
AFPKN
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BCNDV
BENPR
CCPQU
CITATION
FYUFA
GROUPED_DOAJ
HMCUK
HYE
M~E
OK1
PGMZT
PHGZM
PHGZT
PIMPY
RPM
UKHRP
ID FETCH-LOGICAL-c931-61b48734a57155b3f8df36e7a29e7c762cc2dec184f2f683c6e7e0e1f200dc673
ISSN 2572-4754
IngestDate Tue Jul 01 04:16:15 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c931-61b48734a57155b3f8df36e7a29e7c762cc2dec184f2f683c6e7e0e1f200dc673
ORCID 0000-0001-8962-8571
OpenAccessLink https://gatesopenresearch.org/articles/7-56/pdf
ParticipantIDs crossref_primary_10_12688_gatesopenres_14416_1
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-00-00
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – year: 2023
  text: 2023-00-00
PublicationDecade 2020
PublicationTitle Gates open research
PublicationYear 2023
References L Eliot (ref-28) 2021
E Richard (ref-34) 2021; 55
A O’Hagan (ref-12) 2005; 4
D Rosen (ref-16) 2009; 6
M Forbes (ref-23) 2023
D Merkel (ref-44) 2014; 2014
(ref-40) 2015
C Chuang-Stein (ref-15) 2011; 45
ref-29
N Hutchinson (ref-3) 2022; 11
ref-9
S Bird (ref-41) 2009
G Van Rossum (ref-39) 2009
ref-5
Y Yordanov (ref-2) 2015; 350
F Pedregosa (ref-43) 2011; 12
ref-30
K Getz (ref-18) 2023; 57
T Hwang (ref-8) 2016; 176
N Fernando (ref-27) 2022
I Chalkidis (ref-25) 2020
S Viswanath (ref-33) 2021; 16
D Zarin (ref-6) 2019; 322
H Yadav (ref-24) 2022
S Fuller (ref-10) 2017
A Tasneem (ref-47) 2012; 7
L Amiri-Kordestani (ref-19) 2012; 104
M Calvin-Lamas (ref-21) 2018; 25
D Fogel (ref-36) 2018; 11
S Matsuda (ref-26) 2021; 7
M Sharp (ref-49)
G Dutton (ref-32) 2018; 38
T Wood (ref-38) 2023
E Dressler (ref-11) 2019; 73
Z Alhussain (ref-13) 2020; 19
(ref-22) 2020
(ref-4) 2023
Y Wang (ref-14) 2013; 10
A Grignolo (ref-7) 2016; 25
V Apgar (ref-20) 1953; 32
Y Luo (ref-31) 2017; 40
C Mattmann (ref-45) 2012
C Wong (ref-17) 2019; 20
S Chang (ref-37) 2023; 139
X Chen (ref-35) 2020; 10
ref-46
M Honnibal (ref-42) 2017
(ref-48) 2022
References_xml – volume: 2014
  start-page: 2
  year: 2014
  ident: ref-44
  article-title: Docker: lightweight Linux containers for consistent development and deployment.
– ident: ref-49
  article-title: A Single-Dose Clinical Trial to Study the Safety, Tolerability, Pharmacokinetics, and Anti-Retroviral Activity of MK-8591 Monotherapy in Anti-Retroviral Therapy (ART)-Naïve, HIV-1 Infected Patients.
– volume: 32
  start-page: 260-267
  year: 1953
  ident: ref-20
  article-title: A proposal for a new method of evaluation of the newborn infant.
  publication-title: Curr Res Anesth Analg.
  doi: 10.1213/00000539-195301000-00041
– volume: 139
  start-page: 104321
  year: 2023
  ident: ref-37
  article-title: Understanding Common Key Indicators of Successful and Unsuccessful Cancer Drug Trials Using A Contrast Mining Framework on ClinicalTrials.gov.
  publication-title: J Biomed Inform.
  doi: 10.1016/j.jbi.2023.104321
– volume: 12
  start-page: 2825-2830
  year: 2011
  ident: ref-43
  article-title: Scikit-learn: Machine Learning in Python.
– volume: 55
  start-page: 447-453
  year: 2021
  ident: ref-34
  article-title: Text classification for clinical trial operations: evaluation and comparison of natural language processing techniques.
  publication-title: Ther Innov Regul Sci.
  doi: 10.1007/s43441-020-00236-x
– volume: 104
  start-page: 568-569
  year: 2012
  ident: ref-19
  article-title: Why do phase III clinical trials in oncology fail so often?
  publication-title: J Natl Cancer Inst.
  doi: 10.1093/jnci/djs180
– volume: 40
  start-page: 1075-1089
  year: 2017
  ident: ref-31
  article-title: Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review.
  publication-title: Drug Saf.
  doi: 10.1007/s40264-017-0558-6
– year: 2021
  ident: ref-28
  article-title: Generative pre-trained transformers (GPT-3) pertain to AI in the law.
  doi: 10.2139/ssrn.3974887
– year: 2012
  ident: ref-45
  article-title: Tika in action.
– year: 2022
  ident: ref-48
  article-title: PostgreSQL 12.13.
– volume: 11
  start-page: e79491
  year: 2022
  ident: ref-3
  article-title: The proportion of randomized controlled trials that inform clinical practice.
  publication-title: eLife.
  doi: 10.7554/eLife.79491
– volume: 10
  start-page: 2157
  year: 2020
  ident: ref-35
  article-title: Trends and features of the applications of natural language processing techniques for clinical trials text analysis.
  publication-title: Appl Sci.
  doi: 10.3390/app10062157
– volume: 16
  start-page: 302-316
  year: 2021
  ident: ref-33
  article-title: An industrial approach to using artificial intelligence and natural language processing for accelerated document preparation in drug development.
  publication-title: J Pharm Innov.
  doi: 10.1007/s12247-020-09449-x
– volume: 4
  start-page: 187-201
  year: 2005
  ident: ref-12
  article-title: Assurance in clinical trial design.
  publication-title: Pharm Stat.
  doi: 10.1002/pst.175
– ident: ref-46
  article-title: SurveyMonkey.
– year: 2020
  ident: ref-25
  article-title: LEGAL-BERT: The muppets straight out of law school.
  doi: 10.48550/arXiv.2010.02559
– year: 2022
  ident: ref-27
  article-title: Automated vehicle insurance claims processing using computer vision, natural language processing.
  publication-title: 2022 22nd International Conference on Advances in ICT for Emerging Regions (ICTer).
  doi: 10.1109/ICTer58063.2022.10024089
– volume: 6
  start-page: 373-377
  year: 2009
  ident: ref-16
  article-title: Process maps in clinical trial quality assurance.
  publication-title: Clin Trials.
  doi: 10.1177/1740774509338429
– volume: 7
  start-page: e29238
  year: 2021
  ident: ref-26
  article-title: Incorporating Unstructured Patient Narratives and Health Insurance Claims Data in Pharmacovigilance: Natural Language Processing Analysis of Patient-Generated Texts About Systemic Lupus Erythematosus.
  publication-title: JMIR Public Health Surveill.
  doi: 10.2196/29238
– year: 2023
  ident: ref-38
  article-title: Clinical Trial Risk Tool (0.1).
– year: 2022
  ident: ref-24
  article-title: Artificial Intelligence Adoption for FinTech Industries-An Exploratory Study About the Disruptions, Antecedents and Consequences.
  doi: 10.1007/978-3-031-15342-6_1
– ident: ref-5
  article-title: Declaration of Helsinki.
– year: 2009
  ident: ref-39
  article-title: Python 3 Reference Manual.
– year: 2015
  ident: ref-40
  article-title: Collaborative data science.
– volume: 25
  start-page: 251-256
  year: 2018
  ident: ref-21
  article-title: A complexity scale for clinical trials from the perspective of a pharmacy service.
  publication-title: Eur J Hosp Pharm.
  doi: 10.1136/ejhpharm-2017-001282
– year: 2020
  ident: ref-22
  article-title: Clinical Trial Risk & Performance Management vSummit.
– year: 2017
  ident: ref-42
  article-title: spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
– volume: 10
  start-page: 407-13
  year: 2013
  ident: ref-14
  article-title: Evaluating and utilizing probability of study success in clinical development.
  publication-title: Clin Trials.
  doi: 10.1177/1740774513478229
– volume: 45
  start-page: 187-202
  year: 2011
  ident: ref-15
  article-title: A quantitative approach for making Go/No-Go decisions in drug development.
  publication-title: Therapeutic Innovation & Regulatory Science.
  doi: 10.1177/009286151104500213
– volume: 176
  start-page: 1826-1833
  year: 2016
  ident: ref-8
  article-title: Failure of investigational drugs in late-stage clinical development and publication of trial results.
  publication-title: JAMA Intern Med.
  doi: 10.1001/jamainternmed.2016.6008
– volume: 11
  start-page: 156-164
  year: 2018
  ident: ref-36
  article-title: Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review.
  publication-title: Contemp Clin Trials Commun.
  doi: 10.1016/j.conctc.2018.08.001
– volume: 7
  start-page: e33677
  year: 2012
  ident: ref-47
  article-title: The database for aggregate analysis of ClinicalTrials.gov (AACT) and subsequent regrouping by clinical specialty.
  publication-title: PLoS One.
  doi: 10.1371/journal.pone.0033677
– ident: ref-30
  article-title: Everlaw.
– volume: 20
  start-page: 273-286
  year: 2019
  ident: ref-17
  article-title: Estimation of clinical trial success rates and related parameters.
  publication-title: Biostatistics.
  doi: 10.1093/biostatistics/kxx069
– year: 2009
  ident: ref-41
  article-title: Natural language processing with Python: analyzing text with the natural language toolkit.
– ident: ref-29
  article-title: Luminance.
– volume: 350
  start-page: h809
  year: 2015
  ident: ref-2
  article-title: Avoidable waste of research related to inadequate methods in clinical trials.
  publication-title: BMJ.
  doi: 10.1136/bmj.h809
– year: 2023
  ident: ref-4
  article-title: Uninformative research is the global health crisis you’ve never heard of.
– ident: ref-9
  article-title: Clinical Trials Toolkit: Risk Assessment
– volume: 73
  start-page: 210-211
  year: 2019
  ident: ref-11
  article-title: Clinical Trial Optimization Using R.
  doi: 10.1080/00031305.2019.1603479
– volume: 25
  start-page: 36-42
  year: 2016
  ident: ref-7
  article-title: Phase III trial failures: Costly, but preventable.
  publication-title: Appl Clin Trials.
– year: 2017
  ident: ref-10
  article-title: Developing a study risk assessment tool.
– volume: 19
  start-page: 827-839
  year: 2020
  ident: ref-13
  article-title: Assurance for clinical trial design with normally distributed outcomes: Eliciting uncertainty about variances.
  publication-title: Pharm Stat.
  doi: 10.1002/pst.2040
– volume: 57
  start-page: 49-56
  year: 2023
  ident: ref-18
  article-title: Protocol design and performance benchmarks by phase and by oncology and rare disease subgroups.
  publication-title: Ther Innov Regul Sci.
  doi: 10.1007/s43441-022-00438-5
– year: 2023
  ident: ref-23
  article-title: Distilling Constituent Symptoms and Patterns of Repetition in the Diagnostic Criteria of the DSM-5.
– volume: 322
  start-page: 813-814
  year: 2019
  ident: ref-6
  article-title: Harms from uninformative clinical trials.
  publication-title: JAMA.
  doi: 10.1001/jama.2019.9892
– volume: 38
  start-page: 8-9
  year: 2018
  ident: ref-32
  article-title: Big Pharma Reads Big Data, Sees Big Picture: Linguamatics Brings Natural Language Processing to Non-Experts, Expediting Drug Development.
  publication-title: Genet Eng Biotechnol News.
  doi: 10.1089/gen.38.01.05
SSID ssj0002039674
Score 2.2055614
Snippet Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is...
SourceID crossref
SourceType Index Database
StartPage 56
Title Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9wwEBYlufRSWtLSN3PozXi7lmzJ7q2EhlBIDsWluRlLliA97IZ9UNpTf3pn9NgVyVKaXozRroXwfB7NjOabYeydUW1t3Nih5SaasuaKl3qqTMnHUcjOVUL7Ys8Xl_L8a_35qrnah7I9u2SjZ-bXQV7J_0gVx1CuxJK9h2R3k-IA3qN88YoSxus_yfg00Rp733vjC6WJ99QzC938NerXH5TWlR1RF1sfGfDFPPH_KVZZ3AS2gGdOLYtrz911P0P2YUw9D909totYaDUpydy2pUDcuqB2XEUsIbQLNX-LiT0hHWkfP70wl-P1KjPk8yBEYAgHLYWfPC9rFSpBz-yBsahmVaYmQy3xO9qby5YoCRQ_XNNqV5QYhvaanFX77Sod0d_axXa5heTV0ERDPs3gpxnQTT7mSlVN5nt_D6exnfQVu3dLj2wvmun9oQVldkxmkPSP2aPoScDHAIsn7IFdnLDfCRLgIQEECSBIfIAECMgAAR4QEAEBCRCwBwRslpAAAQgIIEDA0oEHBNwBxFPWn33qT8_L2GSjNJ2oSllpdFlFPTb4VhotXDs5Ia0aeWeVwZ3SGD5ZU7W14062wuBvdm4rh9_hZKQSz9jRYrmwzxkINEXR-621nZtadU67eTeKVneT0VY4-4LN0hsbbkIpleGvwnp53wdesYcEzhAhe82ONqutfYM240a_9fL-A6vueAk
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clinical+Trial+Risk+Tool%3A+software+application+using+natural+language+processing+to+identify+the+risk+of+trial+uninformativeness&rft.jtitle=Gates+open+research&rft.au=Wood%2C+Thomas+A&rft.au=McNair%2C+Douglas&rft.date=2023&rft.issn=2572-4754&rft.eissn=2572-4754&rft.volume=7&rft.spage=56&rft_id=info:doi/10.12688%2Fgatesopenres.14416.1&rft.externalDBID=n%2Fa&rft.externalDocID=10_12688_gatesopenres_14416_1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2572-4754&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2572-4754&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2572-4754&client=summon