A novel data-driven approach for Personas validation in healthcare using self-supervised machine learning

[Display omitted] Persona validation is a challenging task, often relying on costly external validation methods. The aim of this study was the development of a novel method for Personas validation based on data already available during their creation. A novel approach based on self-supervised machin...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 165; p. 104815
Main Authors Tauro, Emanuele, Gorini, Alessandra, Bilo, Grzegorz, Caiani, Enrico Gianluca
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.05.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] Persona validation is a challenging task, often relying on costly external validation methods. The aim of this study was the development of a novel method for Personas validation based on data already available during their creation. A novel approach based on self-supervised machine learning (SSML) was proposed. A training-test split was performed (80 % - 20 %), with the training set used for Personas development. The obtained labels were used as input for a 5-fold cross-validation grid search, resulting in 5 optimal different models. The “weak” ground truth for the test set was determined using the trained clustering model, and was compared with the prediction obtained by the majority voting of the optimal models. Performance evaluation was conducted by means of weighted accuracy, precision, recall and F1 score. The proposed method was evaluated on two very different healthcare datasets composed by questionnaires. The former was presented 1070 subjects, resulting in three unbalanced Personas (P0 n = 100; P1 n = 292; P2 n = 464). The latter included 176 subjects with three slightly unbalanced Personas. (P0 n = 58; P1 n = 32; P2 n = 50). The SSML approach resulted capable of correctly differentiating the clusters with high values of weighted accuracy (88.27 % and 94.12 %), precision (87.11 % and 92.83 %), recall (86.92 % and 91.67 %), and F1 score (86.92 % and 91.76 %). The proposed method showed high capabilities in generalization beyond the training data, validating the Personas’ capability of stratifying the characteristics of target populations. Additionally, this method significantly reduced the costs to validate Personas when compared to other methods in current literature.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
1532-0480
DOI:10.1016/j.jbi.2025.104815