Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports

Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks fo...

Full description

Saved in:

Bibliographic Details
Published in	AMIA ... Annual Symposium proceedings Vol. 2021; pp. 631 - 640
Main Authors	Khan, Mirza S, Landman, Bennett A, Deppen, Stephen A, Matheny, Michael E
Format	Journal Article
Language	English
Published	United States American Medical Informatics Association 2021
Subjects	Data Collection Humans Natural Language Processing PubMed Radiology Semantics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks for embedding models using a corpus of radiology reports: term pair similarity for NCWEs and cloze task accuracy for CWEs. Using surveys, we quantified the agreement between clinician judgment and embedding model representations. We compare embedding models trained on a custom radiology report corpus (RRC), a general corpus, and PubMed and MIMIC-III corpora (P&MC). Cloze task accuracy was equivalent for RRC and P&MC models. For term pair similarity, P&MC-trained NCWEs outperformed all other NCWE models (ρ 0.61 vs. 0.27-0.44). Among models trained on RRC, fastText models often outperformed other NCWE models and spherical embeddings provided overly optimistic representations of term pair similarity.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1559-4076