A methodology for cohort harmonisation in multicentre clinical research
Many clinical trials and scientific studies have been conducted aiming for better understanding of specific medical conditions. However, these studies are often based on a small number of participants due to the difficulty in finding people with similar medical characteristics and available to parti...
Saved in:
Published in | Informatics in medicine unlocked Vol. 27; p. 100760 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
2021
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Many clinical trials and scientific studies have been conducted aiming for better understanding of specific medical conditions. However, these studies are often based on a small number of participants due to the difficulty in finding people with similar medical characteristics and available to participate in the studies. This is particularly critical in rare diseases, where the reduced number of subjects hinders reliable findings. To generate more substantial clinical evidence by increasing the power of the analyses, researchers have started to perform data harmonisation and multiple cohort analyses. However, the analysis of heterogeneous data sources implies dealing with different data structures, terminologies, concepts, languages and, most importantly, the knowledge behind the data.
In this paper, we present a methodology to harmonise different cohorts into a standard data schema, helping the research community to generate evidence from a wider variety of data sources. Our methodology was inspired by the OHDSI Common Data Model, which aims to harmonise EHR datasets for observational studies, leveraging on knowledge and open source tools to perform multicentric disease-specific studies. This proposal was validated using Alzheimer’s Disease cohorts from several countries, combining at the end 6,669 subjects and 172 clinical concepts. The harmonised datasets now enable multi-cohort querying and analysis, helping in the execution of new research. The methodology was implemented in Python language and is available, under the MIT licence, at https://bioinformatics-ua.github.io/CMToolkit/.
•This work proposes a strategy for semi-automatic harmonisation of large amounts of medical concepts in clinical studies.•It creates new opportunities for the study of rare conditions, where typically isolated cohorts do not provide enough statistical evidence.•The methodology can augment clinical knowledge by automatically computing new patient information during the migration stage.•This work supports the Alzheimer’s Disease research community in Europe, by enabling multicentre studies.•The source code of this methodology is publicly available and can be used in other domains. |
---|---|
ISSN: | 2352-9148 2352-9148 |
DOI: | 10.1016/j.imu.2021.100760 |