Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness

Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is called “uninformativeness”. Some high-risk indicators of uninformativeness can be identified at the stage of drafting the protocol, however t...

Full description

Saved in:

Bibliographic Details
Published in	Gates open research Vol. 7; p. 56
Main Authors	Wood, Thomas A, McNair, Douglas
Format	Journal Article
Language	English
Published	2023
Online Access	Get full text
ISSN	2572-4754 2572-4754
DOI	10.12688/gatesopenres.14416.1

Cover

Abstract	Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is called “uninformativeness”. Some high-risk indicators of uninformativeness can be identified at the stage of drafting the protocol, however the necessary information can be hard to find in unstructured text documents. Methods : We have developed a browser-based tool which uses natural language processing to identify and quantify the risk of uninformativeness. The tool reads and parses the text of trial protocols and identifies key features of the trial design, which are fed into a risk model. The application runs in a browser and features a graphical user interface that allows a user to drag and drop the PDF of the trial protocol and visualize the risk indicators and their locations in the text. The user can correct inaccuracies in the tool’s parsing of the text. The tool outputs a PDF report listing the key features extracted. The tool is focused HIV and tuberculosis trials but could be extended to more pathologies in future. Results: On a manually tagged dataset of 300 protocols, the tool was able to identify the condition of a trial with 100% area under curve (AUC), presence or absence of statistical analysis plan with 87% AUC, presence or absence of effect estimate with 95% AUC, number of subjects with 69% accuracy, and simulation with 98% AUC. On a dataset of 11,925 protocols downloaded from ClinicalTrials.gov, the tool was able to identify trial phase with 75% accuracy, number of arms with 58% accuracy, and the countries of investigation with 87% AUC. Conclusion : We have developed and validated a natural language processing tool for identifying and quantifying risks of uninformativeness in clinical trial protocols. The software is open-source and can be accessed at the following link: https://app.clinicaltrialrisk.org
AbstractList	Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is called “uninformativeness”. Some high-risk indicators of uninformativeness can be identified at the stage of drafting the protocol, however the necessary information can be hard to find in unstructured text documents. Methods : We have developed a browser-based tool which uses natural language processing to identify and quantify the risk of uninformativeness. The tool reads and parses the text of trial protocols and identifies key features of the trial design, which are fed into a risk model. The application runs in a browser and features a graphical user interface that allows a user to drag and drop the PDF of the trial protocol and visualize the risk indicators and their locations in the text. The user can correct inaccuracies in the tool’s parsing of the text. The tool outputs a PDF report listing the key features extracted. The tool is focused HIV and tuberculosis trials but could be extended to more pathologies in future. Results: On a manually tagged dataset of 300 protocols, the tool was able to identify the condition of a trial with 100% area under curve (AUC), presence or absence of statistical analysis plan with 87% AUC, presence or absence of effect estimate with 95% AUC, number of subjects with 69% accuracy, and simulation with 98% AUC. On a dataset of 11,925 protocols downloaded from ClinicalTrials.gov, the tool was able to identify trial phase with 75% accuracy, number of arms with 58% accuracy, and the countries of investigation with 87% AUC. Conclusion : We have developed and validated a natural language processing tool for identifying and quantifying risks of uninformativeness in clinical trial protocols. The software is open-source and can be accessed at the following link: https://app.clinicaltrialrisk.org
Author	Wood, Thomas A McNair, Douglas
Author_xml	– sequence: 1 givenname: Thomas A orcidid: 0000-0001-8962-8571 surname: Wood fullname: Wood, Thomas A – sequence: 2 givenname: Douglas surname: McNair fullname: McNair, Douglas
BookMark	eNpNkMtOwzAQRS1UJErpJyD5B1r8ip2yQxUvqRISyj5ynXEwpHZkO6Du-HRCYNHNzEj3zlmcSzTzwQNC15SsKZNledPqDCn04COkNRWCyjU9Q3NWKLYSqhCzk_sCLVN6J4QwwjdSiTn63nbOO6M7XEU3zleXPnAVQneLU7D5S0fAuu-7sZJd8HhIzrfY6zzEsd1p3w66BdzHYCBNWQ7YNeCzs0ec3wDHX2KwOE_8wTtvQzyMtE_w48sVOre6S7D83wtUPdxX26fV7uXxeXu3W5kNpytJ96JUXOhC0aLYc1s2lktQmm1AGSWZMawBQ0thmZUlN2MGBKhlhDRGKr5AxR_WxJBSBFv30R10PNaU1JPI-lRkPYmsKf8BueJwHg
Cites_doi	10.1213/00000539-195301000-00041 10.1016/j.jbi.2023.104321 10.1007/s43441-020-00236-x 10.1093/jnci/djs180 10.1007/s40264-017-0558-6 10.2139/ssrn.3974887 10.7554/eLife.79491 10.3390/app10062157 10.1007/s12247-020-09449-x 10.1002/pst.175 10.48550/arXiv.2010.02559 10.1109/ICTer58063.2022.10024089 10.1177/1740774509338429 10.2196/29238 10.1007/978-3-031-15342-6_1 10.1136/ejhpharm-2017-001282 10.1177/1740774513478229 10.1177/009286151104500213 10.1001/jamainternmed.2016.6008 10.1016/j.conctc.2018.08.001 10.1371/journal.pone.0033677 10.1093/biostatistics/kxx069 10.1136/bmj.h809 10.1080/00031305.2019.1603479 10.1002/pst.2040 10.1007/s43441-022-00438-5 10.1001/jama.2019.9892 10.1089/gen.38.01.05
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.12688/gatesopenres.14416.1
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
EISSN	2572-4754
ExternalDocumentID	10_12688_gatesopenres_14416_1
GroupedDBID	7X7 8FI 8FJ AAFWJ AAYXX ABUWG ADBBV AFKRA AFPKN ALIPV ALMA_UNASSIGNED_HOLDINGS AOIJS BCNDV BENPR CCPQU CITATION FYUFA GROUPED_DOAJ HMCUK HYE M~E OK1 PGMZT PHGZM PHGZT PIMPY RPM UKHRP
ID	FETCH-LOGICAL-c931-61b48734a57155b3f8df36e7a29e7c762cc2dec184f2f683c6e7e0e1f200dc673
ISSN	2572-4754
IngestDate	Tue Jul 01 04:16:15 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c931-61b48734a57155b3f8df36e7a29e7c762cc2dec184f2f683c6e7e0e1f200dc673
ORCID	0000-0001-8962-8571
OpenAccessLink	https://gatesopenresearch.org/articles/7-56/pdf
ParticipantIDs	crossref_primary_10_12688_gatesopenres_14416_1
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-00-00
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– year: 2023 text: 2023-00-00
PublicationDecade	2020
PublicationTitle	Gates open research
PublicationYear	2023
References	L Eliot (ref-28) 2021 E Richard (ref-34) 2021; 55 A O’Hagan (ref-12) 2005; 4 D Rosen (ref-16) 2009; 6 M Forbes (ref-23) 2023 D Merkel (ref-44) 2014; 2014 (ref-40) 2015 C Chuang-Stein (ref-15) 2011; 45 ref-29 N Hutchinson (ref-3) 2022; 11 ref-9 S Bird (ref-41) 2009 G Van Rossum (ref-39) 2009 ref-5 Y Yordanov (ref-2) 2015; 350 F Pedregosa (ref-43) 2011; 12 ref-30 K Getz (ref-18) 2023; 57 T Hwang (ref-8) 2016; 176 N Fernando (ref-27) 2022 I Chalkidis (ref-25) 2020 S Viswanath (ref-33) 2021; 16 D Zarin (ref-6) 2019; 322 H Yadav (ref-24) 2022 S Fuller (ref-10) 2017 A Tasneem (ref-47) 2012; 7 L Amiri-Kordestani (ref-19) 2012; 104 M Calvin-Lamas (ref-21) 2018; 25 D Fogel (ref-36) 2018; 11 S Matsuda (ref-26) 2021; 7 M Sharp (ref-49) G Dutton (ref-32) 2018; 38 T Wood (ref-38) 2023 E Dressler (ref-11) 2019; 73 Z Alhussain (ref-13) 2020; 19 (ref-22) 2020 (ref-4) 2023 Y Wang (ref-14) 2013; 10 A Grignolo (ref-7) 2016; 25 V Apgar (ref-20) 1953; 32 Y Luo (ref-31) 2017; 40 C Mattmann (ref-45) 2012 C Wong (ref-17) 2019; 20 S Chang (ref-37) 2023; 139 X Chen (ref-35) 2020; 10 ref-46 M Honnibal (ref-42) 2017 (ref-48) 2022
References_xml	– volume: 2014 start-page: 2 year: 2014 ident: ref-44 article-title: Docker: lightweight Linux containers for consistent development and deployment. – ident: ref-49 article-title: A Single-Dose Clinical Trial to Study the Safety, Tolerability, Pharmacokinetics, and Anti-Retroviral Activity of MK-8591 Monotherapy in Anti-Retroviral Therapy (ART)-Naïve, HIV-1 Infected Patients. – volume: 32 start-page: 260-267 year: 1953 ident: ref-20 article-title: A proposal for a new method of evaluation of the newborn infant. publication-title: Curr Res Anesth Analg. doi: 10.1213/00000539-195301000-00041 – volume: 139 start-page: 104321 year: 2023 ident: ref-37 article-title: Understanding Common Key Indicators of Successful and Unsuccessful Cancer Drug Trials Using A Contrast Mining Framework on ClinicalTrials.gov. publication-title: J Biomed Inform. doi: 10.1016/j.jbi.2023.104321 – volume: 12 start-page: 2825-2830 year: 2011 ident: ref-43 article-title: Scikit-learn: Machine Learning in Python. – volume: 55 start-page: 447-453 year: 2021 ident: ref-34 article-title: Text classification for clinical trial operations: evaluation and comparison of natural language processing techniques. publication-title: Ther Innov Regul Sci. doi: 10.1007/s43441-020-00236-x – volume: 104 start-page: 568-569 year: 2012 ident: ref-19 article-title: Why do phase III clinical trials in oncology fail so often? publication-title: J Natl Cancer Inst. doi: 10.1093/jnci/djs180 – volume: 40 start-page: 1075-1089 year: 2017 ident: ref-31 article-title: Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review. publication-title: Drug Saf. doi: 10.1007/s40264-017-0558-6 – year: 2021 ident: ref-28 article-title: Generative pre-trained transformers (GPT-3) pertain to AI in the law. doi: 10.2139/ssrn.3974887 – year: 2012 ident: ref-45 article-title: Tika in action. – year: 2022 ident: ref-48 article-title: PostgreSQL 12.13. – volume: 11 start-page: e79491 year: 2022 ident: ref-3 article-title: The proportion of randomized controlled trials that inform clinical practice. publication-title: eLife. doi: 10.7554/eLife.79491 – volume: 10 start-page: 2157 year: 2020 ident: ref-35 article-title: Trends and features of the applications of natural language processing techniques for clinical trials text analysis. publication-title: Appl Sci. doi: 10.3390/app10062157 – volume: 16 start-page: 302-316 year: 2021 ident: ref-33 article-title: An industrial approach to using artificial intelligence and natural language processing for accelerated document preparation in drug development. publication-title: J Pharm Innov. doi: 10.1007/s12247-020-09449-x – volume: 4 start-page: 187-201 year: 2005 ident: ref-12 article-title: Assurance in clinical trial design. publication-title: Pharm Stat. doi: 10.1002/pst.175 – ident: ref-46 article-title: SurveyMonkey. – year: 2020 ident: ref-25 article-title: LEGAL-BERT: The muppets straight out of law school. doi: 10.48550/arXiv.2010.02559 – year: 2022 ident: ref-27 article-title: Automated vehicle insurance claims processing using computer vision, natural language processing. publication-title: 2022 22nd International Conference on Advances in ICT for Emerging Regions (ICTer). doi: 10.1109/ICTer58063.2022.10024089 – volume: 6 start-page: 373-377 year: 2009 ident: ref-16 article-title: Process maps in clinical trial quality assurance. publication-title: Clin Trials. doi: 10.1177/1740774509338429 – volume: 7 start-page: e29238 year: 2021 ident: ref-26 article-title: Incorporating Unstructured Patient Narratives and Health Insurance Claims Data in Pharmacovigilance: Natural Language Processing Analysis of Patient-Generated Texts About Systemic Lupus Erythematosus. publication-title: JMIR Public Health Surveill. doi: 10.2196/29238 – year: 2023 ident: ref-38 article-title: Clinical Trial Risk Tool (0.1). – year: 2022 ident: ref-24 article-title: Artificial Intelligence Adoption for FinTech Industries-An Exploratory Study About the Disruptions, Antecedents and Consequences. doi: 10.1007/978-3-031-15342-6_1 – ident: ref-5 article-title: Declaration of Helsinki. – year: 2009 ident: ref-39 article-title: Python 3 Reference Manual. – year: 2015 ident: ref-40 article-title: Collaborative data science. – volume: 25 start-page: 251-256 year: 2018 ident: ref-21 article-title: A complexity scale for clinical trials from the perspective of a pharmacy service. publication-title: Eur J Hosp Pharm. doi: 10.1136/ejhpharm-2017-001282 – year: 2020 ident: ref-22 article-title: Clinical Trial Risk & Performance Management vSummit. – year: 2017 ident: ref-42 article-title: spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. – volume: 10 start-page: 407-13 year: 2013 ident: ref-14 article-title: Evaluating and utilizing probability of study success in clinical development. publication-title: Clin Trials. doi: 10.1177/1740774513478229 – volume: 45 start-page: 187-202 year: 2011 ident: ref-15 article-title: A quantitative approach for making Go/No-Go decisions in drug development. publication-title: Therapeutic Innovation & Regulatory Science. doi: 10.1177/009286151104500213 – volume: 176 start-page: 1826-1833 year: 2016 ident: ref-8 article-title: Failure of investigational drugs in late-stage clinical development and publication of trial results. publication-title: JAMA Intern Med. doi: 10.1001/jamainternmed.2016.6008 – volume: 11 start-page: 156-164 year: 2018 ident: ref-36 article-title: Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. publication-title: Contemp Clin Trials Commun. doi: 10.1016/j.conctc.2018.08.001 – volume: 7 start-page: e33677 year: 2012 ident: ref-47 article-title: The database for aggregate analysis of ClinicalTrials.gov (AACT) and subsequent regrouping by clinical specialty. publication-title: PLoS One. doi: 10.1371/journal.pone.0033677 – ident: ref-30 article-title: Everlaw. – volume: 20 start-page: 273-286 year: 2019 ident: ref-17 article-title: Estimation of clinical trial success rates and related parameters. publication-title: Biostatistics. doi: 10.1093/biostatistics/kxx069 – year: 2009 ident: ref-41 article-title: Natural language processing with Python: analyzing text with the natural language toolkit. – ident: ref-29 article-title: Luminance. – volume: 350 start-page: h809 year: 2015 ident: ref-2 article-title: Avoidable waste of research related to inadequate methods in clinical trials. publication-title: BMJ. doi: 10.1136/bmj.h809 – year: 2023 ident: ref-4 article-title: Uninformative research is the global health crisis you’ve never heard of. – ident: ref-9 article-title: Clinical Trials Toolkit: Risk Assessment – volume: 73 start-page: 210-211 year: 2019 ident: ref-11 article-title: Clinical Trial Optimization Using R. doi: 10.1080/00031305.2019.1603479 – volume: 25 start-page: 36-42 year: 2016 ident: ref-7 article-title: Phase III trial failures: Costly, but preventable. publication-title: Appl Clin Trials. – year: 2017 ident: ref-10 article-title: Developing a study risk assessment tool. – volume: 19 start-page: 827-839 year: 2020 ident: ref-13 article-title: Assurance for clinical trial design with normally distributed outcomes: Eliciting uncertainty about variances. publication-title: Pharm Stat. doi: 10.1002/pst.2040 – volume: 57 start-page: 49-56 year: 2023 ident: ref-18 article-title: Protocol design and performance benchmarks by phase and by oncology and rare disease subgroups. publication-title: Ther Innov Regul Sci. doi: 10.1007/s43441-022-00438-5 – year: 2023 ident: ref-23 article-title: Distilling Constituent Symptoms and Patterns of Repetition in the Diagnostic Criteria of the DSM-5. – volume: 322 start-page: 813-814 year: 2019 ident: ref-6 article-title: Harms from uninformative clinical trials. publication-title: JAMA. doi: 10.1001/jama.2019.9892 – volume: 38 start-page: 8-9 year: 2018 ident: ref-32 article-title: Big Pharma Reads Big Data, Sees Big Picture: Linguamatics Brings Natural Language Processing to Non-Experts, Expediting Drug Development. publication-title: Genet Eng Biotechnol News. doi: 10.1089/gen.38.01.05
SSID	ssj0002039674
Score	2.2055614
Snippet	Background : A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is...
SourceID	crossref
SourceType	Index Database
StartPage	56
Title	Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness
Volume	7
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9wwEBYlufRSWtLSN3PozXi7lmzJ7q2EhlBIDsWluRlLliA97IZ9UNpTf3pn9NgVyVKaXozRroXwfB7NjOabYeydUW1t3Nih5SaasuaKl3qqTMnHUcjOVUL7Ys8Xl_L8a_35qrnah7I9u2SjZ-bXQV7J_0gVx1CuxJK9h2R3k-IA3qN88YoSxus_yfg00Rp733vjC6WJ99QzC938NerXH5TWlR1RF1sfGfDFPPH_KVZZ3AS2gGdOLYtrz911P0P2YUw9D909totYaDUpydy2pUDcuqB2XEUsIbQLNX-LiT0hHWkfP70wl-P1KjPk8yBEYAgHLYWfPC9rFSpBz-yBsahmVaYmQy3xO9qby5YoCRQ_XNNqV5QYhvaanFX77Sod0d_axXa5heTV0ERDPs3gpxnQTT7mSlVN5nt_D6exnfQVu3dLj2wvmun9oQVldkxmkPSP2aPoScDHAIsn7IFdnLDfCRLgIQEECSBIfIAECMgAAR4QEAEBCRCwBwRslpAAAQgIIEDA0oEHBNwBxFPWn33qT8_L2GSjNJ2oSllpdFlFPTb4VhotXDs5Ia0aeWeVwZ3SGD5ZU7W14062wuBvdm4rh9_hZKQSz9jRYrmwzxkINEXR-621nZtadU67eTeKVneT0VY4-4LN0hsbbkIpleGvwnp53wdesYcEzhAhe82ONqutfYM240a_9fL-A6vueAk
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clinical+Trial+Risk+Tool%3A+software+application+using+natural+language+processing+to+identify+the+risk+of+trial+uninformativeness&rft.jtitle=Gates+open+research&rft.au=Wood%2C+Thomas+A&rft.au=McNair%2C+Douglas&rft.date=2023&rft.issn=2572-4754&rft.eissn=2572-4754&rft.volume=7&rft.spage=56&rft_id=info:doi/10.12688%2Fgatesopenres.14416.1&rft.externalDBID=n%2Fa&rft.externalDocID=10_12688_gatesopenres_14416_1
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2572-4754&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2572-4754&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2572-4754&client=summon