A methodology for cohort harmonisation in multicentre clinical research

Many clinical trials and scientific studies have been conducted aiming for better understanding of specific medical conditions. However, these studies are often based on a small number of participants due to the difficulty in finding people with similar medical characteristics and available to parti...

Full description

Saved in:

Bibliographic Details
Published in	Informatics in medicine unlocked Vol. 27; p. 100760
Main Authors	Almeida, João Rafael, Silva, Luís Bastão, Bos, Isabelle, Visser, Pieter Jelle, Oliveira, José Luís
Format	Journal Article
Language	English
Published	Elsevier Ltd 2021 Elsevier
Subjects	Clinical studies Data harmonisation ETL Observational studies OMOP CDM OMOP CDM ETL Clinical studies Observational studies Data harmonisation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many clinical trials and scientific studies have been conducted aiming for better understanding of specific medical conditions. However, these studies are often based on a small number of participants due to the difficulty in finding people with similar medical characteristics and available to participate in the studies. This is particularly critical in rare diseases, where the reduced number of subjects hinders reliable findings. To generate more substantial clinical evidence by increasing the power of the analyses, researchers have started to perform data harmonisation and multiple cohort analyses. However, the analysis of heterogeneous data sources implies dealing with different data structures, terminologies, concepts, languages and, most importantly, the knowledge behind the data. In this paper, we present a methodology to harmonise different cohorts into a standard data schema, helping the research community to generate evidence from a wider variety of data sources. Our methodology was inspired by the OHDSI Common Data Model, which aims to harmonise EHR datasets for observational studies, leveraging on knowledge and open source tools to perform multicentric disease-specific studies. This proposal was validated using Alzheimer’s Disease cohorts from several countries, combining at the end 6,669 subjects and 172 clinical concepts. The harmonised datasets now enable multi-cohort querying and analysis, helping in the execution of new research. The methodology was implemented in Python language and is available, under the MIT licence, at https://bioinformatics-ua.github.io/CMToolkit/. •This work proposes a strategy for semi-automatic harmonisation of large amounts of medical concepts in clinical studies.•It creates new opportunities for the study of rare conditions, where typically isolated cohorts do not provide enough statistical evidence.•The methodology can augment clinical knowledge by automatically computing new patient information during the migration stage.•This work supports the Alzheimer’s Disease research community in Europe, by enabling multicentre studies.•The source code of this methodology is publicly available and can be used in other domains.
ISSN:	2352-9148 2352-9148
DOI:	10.1016/j.imu.2021.100760