Quality assessment of real-world data repositories across the data life cycle: A literature review

Abstract Objective Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Medical Informatics Association : JAMIA Vol. 28; no. 7; pp. 1591 - 1599
Main Authors Liaw, Siaw-Teng, Guo, Jason Guan Nan, Ansari, Sameera, Jonnagaddala, Jitendra, Godinho, Myron Anthony, Borelli, Alder Jose, de Lusignan, Simon, Capurro, Daniel, Liyanage, Harshana, Bhattal, Navreet, Bennett, Vicki, Chan, Jaclyn, Kahn, Michael G
Format Journal Article
LanguageEnglish
Published England Oxford University Press 14.07.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Objective Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. Materials and Methods The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. Results The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. Conclusions A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
ObjectType-Review-3
content type line 23
ISSN:1527-974X
1067-5027
1527-974X
DOI:10.1093/jamia/ocaa340