Mapping of UK Biobank clinical codes: Challenges and possible solutions

The UK Biobank provides a rich collection of longitudinal clinical data coming from different healthcare providers and sources in England, Wales, and Scotland. Although extremely valuable and available to a wide research community, the heterogeneous dataset contains inconsistent medical terminology...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 17; no. 12; p. e0275816
Main Authors Stroganov, Oleg, Fedarovich, Alena, Wong, Emily, Skovpen, Yulia, Pakhomova, Elena, Grishagin, Ivan, Fedarovich, Dzmitry, Khasanova, Tania, Merberg, David, Szalma, Sándor, Bryant, Julie
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 16.12.2022
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The UK Biobank provides a rich collection of longitudinal clinical data coming from different healthcare providers and sources in England, Wales, and Scotland. Although extremely valuable and available to a wide research community, the heterogeneous dataset contains inconsistent medical terminology that is either aligned to several ontologies within the same category or unprocessed. To make these data useful to a research community, data cleaning, curation, and standardization are needed. Significant efforts to perform data reformatting, mapping to any selected ontologies (such as SNOMED-CT) and harmonization are required from any data user to integrate UK Biobank hospital inpatient and self-reported data, data from various registers with primary care (GP) data. The integrated clinical data would provide a more comprehensive picture of one's medical history. We evaluated several approaches to map GP clinical Read codes to International Classification of Diseases (ICD) and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) terminologies. The results were compared, mapping inconsistencies were flagged, a quality category was assigned to each mapping to evaluate overall mapping quality. We propose a curation and data integration pipeline for harmonizing diagnosis. We also report challenges identified in mapping Read codes from UK Biobank GP tables to ICD and SNOMED CT. Some of the challenges-the lack of precise one-to-one mapping between ontologies or the need for additional ontology to fully map terms-are general reflecting trade-offs to be made at different steps. Other challenges are due to automatic mapping and can be overcome by leveraging existing mappings, supplemented with automated and manual curation.
Bibliography:Competing Interests: E.W, D.M. and S.S. were employees of Takeda Development Center Americas, Inc.; and are stockholders of Takeda Pharmaceuticals Company Limited. S.S. is a stockholder of Johnson & Johnson. O.S., A.F., D.F., Y.S., E.P., I.G., T.K. and J.B. were employees of Rancho BioSciences, LLC. There are no patents, products in development or marketed products associated with this research to declare. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0275816