Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations

[Display omitted] •Assessing semantic similarity of clinical trial outcomes is required to detect outcome switching.•A corpus for semantic similarity of pairs of trial outcomes is released.•Existing methods of similarity assessment rely on domain-specific resources.•Deep learning is used for similar...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 100; p. 100058
Main Authors Koroleva, Anna, Kamath, Sanjay, Paroubek, Patrick
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.01.2019
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •Assessing semantic similarity of clinical trial outcomes is required to detect outcome switching.•A corpus for semantic similarity of pairs of trial outcomes is released.•Existing methods of similarity assessment rely on domain-specific resources.•Deep learning is used for similarity assessment without lexical resources required.•Fine-tuned BioBERT and SciBERT models outperform other approaches. Outcomes are variables monitored during a clinical trial to assess the impact of an intervention on humans’ health.Automatic assessment of semantic similarity of trial outcomes is required for a number of tasks, such as detection of outcome switching (unjustified changes of pre-defined outcomes of a trial) and implementation of Core Outcome Sets (minimal sets of outcomes that should be reported in a particular medical domain). We aimed at building an algorithm for assessing semantic similarity of pairs of primary and reported outcomes.We focused on approaches that do not require manually curated domain-specific resources such as ontologies and thesauri. We tested several approaches, including single measures of similarity (based on strings, stems and lemmas, paths and distances in an ontology, and vector representations of phrases), classifiers using a combination of single measures as features, and a deep learning approach that consists in fine-tuning pre-trained deep language representations.We tested language models provided by BERT (trained on general-domain texts), BioBERT and SciBERT (trained on biomedical and scientific texts, respectively).We explored the possibility of improving the results by taking into account the variants for referring to an outcome (e.g.the use of a measurement tool name instead on the outcome name; the use of abbreviations).We release an open corpus with annotation for similarity of pairs of outcomes. Classifiers using a combination of single measures as features outperformed the single measures, while deep learning algorithms using BioBERT and SciBERT models outperformed the classifiers.BioBERT reached the best F-measure of 89.75%.The addition of variants of outcomes did not improve the results for the best-performing single measures nor for the classifiers, but it improved the performance of deep learning algorithms: BioBERT achieved an F-measure of93.38%. Deep learning approaches using pre-trained language representations outperformed other approaches for similarity assessment of trial outcomes, without relying on any manually curated domain-specific resources (ontologies and other lexical resources). Addition of variants of outcomes further improved the performance of deep learning algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
DOI:10.1016/j.yjbinx.2019.100058