Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports
Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks fo...
Saved in:
Published in | AMIA ... Annual Symposium proceedings Vol. 2021; pp. 631 - 640 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
American Medical Informatics Association
2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks for embedding models using a corpus of radiology reports: term pair similarity for NCWEs and cloze task accuracy for CWEs. Using surveys, we quantified the agreement between clinician judgment and embedding model representations. We compare embedding models trained on a custom radiology report corpus (RRC), a general corpus, and PubMed and MIMIC-III corpora (P&MC). Cloze task accuracy was equivalent for RRC and P&MC models. For term pair similarity, P&MC-trained NCWEs outperformed all other NCWE models (ρ
0.61 vs. 0.27-0.44). Among models trained on RRC, fastText models often outperformed other NCWE models and spherical embeddings provided overly optimistic representations of term pair similarity. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1559-4076 |