A novel data-driven approach for Personas validation in healthcare using self-supervised machine learning
[Display omitted] Persona validation is a challenging task, often relying on costly external validation methods. The aim of this study was the development of a novel method for Personas validation based on data already available during their creation. A novel approach based on self-supervised machin...
Saved in:
Published in | Journal of biomedical informatics Vol. 165; p. 104815 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Inc
01.05.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | [Display omitted]
Persona validation is a challenging task, often relying on costly external validation methods. The aim of this study was the development of a novel method for Personas validation based on data already available during their creation.
A novel approach based on self-supervised machine learning (SSML) was proposed. A training-test split was performed (80 % - 20 %), with the training set used for Personas development. The obtained labels were used as input for a 5-fold cross-validation grid search, resulting in 5 optimal different models. The “weak” ground truth for the test set was determined using the trained clustering model, and was compared with the prediction obtained by the majority voting of the optimal models. Performance evaluation was conducted by means of weighted accuracy, precision, recall and F1 score.
The proposed method was evaluated on two very different healthcare datasets composed by questionnaires. The former was presented 1070 subjects, resulting in three unbalanced Personas (P0 n = 100; P1 n = 292; P2 n = 464). The latter included 176 subjects with three slightly unbalanced Personas. (P0 n = 58; P1 n = 32; P2 n = 50). The SSML approach resulted capable of correctly differentiating the clusters with high values of weighted accuracy (88.27 % and 94.12 %), precision (87.11 % and 92.83 %), recall (86.92 % and 91.67 %), and F1 score (86.92 % and 91.76 %).
The proposed method showed high capabilities in generalization beyond the training data, validating the Personas’ capability of stratifying the characteristics of target populations. Additionally, this method significantly reduced the costs to validate Personas when compared to other methods in current literature. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1532-0464 1532-0480 1532-0480 |
DOI: | 10.1016/j.jbi.2025.104815 |