Integration of large-scale community-developed causal loop diagrams: a Natural Language Processing approach to merging factors based on semantic similarity

Complex public health problems have been addressed in communities through systems thinking and participatory methods like Group Model Building (GMB) and Causal Loop Diagrams (CLDs) albeit with some challenges. This study aimed to explore the feasibility of Natural Language Processing (NLP) in simpli...

Full description

Saved in:
Bibliographic Details
Published inBMC public health Vol. 25; no. 1; pp. 923 - 9
Main Authors Valdivia Cabrera, Melissa, Johnstone, Michael, Hayward, Joshua, Bolton, Kristy A., Creighton, Douglas
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 08.03.2025
BioMed Central
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Complex public health problems have been addressed in communities through systems thinking and participatory methods like Group Model Building (GMB) and Causal Loop Diagrams (CLDs) albeit with some challenges. This study aimed to explore the feasibility of Natural Language Processing (NLP) in simplifying and enhancing CLD merging processes, avoiding manual merging of factors, utilizing different semantic textual similarity models. The factors of thirteen CLDs from different communities in Victoria, Australia regarding the health and wellbeing of children and young people were merged using NLP with the following process: (1) extracting and preprocessing of unique factor names; (2) assessing factor similarity using various language models; (3) determining optimal merging threshold maximising the F1-score; (4) merging the factors of the 13 CLDs based on the selected threshold. Overall sentence-transformer models performed better compared to word2vec, average word embeddings and Jaccard similarity. Of 161,182 comparisons, 1,123 with a score above 0.7 given by sentence-transformer models were analysed by the subject matter experts. Paraphrase-multilingual-mpnet-base-v2 had the highest F1-score of 0.68 and was used to merge the factors with a threshold of 0.75. From 592 factors, 344 were merged into 66 groups. Utilizing language models facilitates identification of similar factors and has potential to aid researchers in constructing CLDs whilst reducing the time required to manually merge them. While models accurately merge synonymous or closely related factors, manual intervention may be required for specific cases.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2458
1471-2458
DOI:10.1186/s12889-025-22142-3