Integration of large-scale community-developed causal loop diagrams: a Natural Language Processing approach to merging factors based on semantic similarity

Complex public health problems have been addressed in communities through systems thinking and participatory methods like Group Model Building (GMB) and Causal Loop Diagrams (CLDs) albeit with some challenges. This study aimed to explore the feasibility of Natural Language Processing (NLP) in simpli...

Full description

Saved in:

Bibliographic Details
Published in	BMC public health Vol. 25; no. 1; pp. 923 - 9
Main Authors	Valdivia Cabrera, Melissa, Johnstone, Michael, Hayward, Joshua, Bolton, Kristy A., Creighton, Douglas
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 08.03.2025 BioMed Central BMC
Subjects	Analysis Causal inference Causality Child Community Community health services Computational linguistics Decision making Health Health problems Humans Language Language processing Local government Mental disorders Mental health Methods Natural language interfaces Natural Language Processing Neural networks Obesity Ontology Optimization Public health Qualitative research Semantic similarity Semantics Sentences Sentiment analysis Similarity Systems thinking Victoria Young adults Victoria Australia Semantic similarity Systems thinking Health Community Natural Language Processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Complex public health problems have been addressed in communities through systems thinking and participatory methods like Group Model Building (GMB) and Causal Loop Diagrams (CLDs) albeit with some challenges. This study aimed to explore the feasibility of Natural Language Processing (NLP) in simplifying and enhancing CLD merging processes, avoiding manual merging of factors, utilizing different semantic textual similarity models. The factors of thirteen CLDs from different communities in Victoria, Australia regarding the health and wellbeing of children and young people were merged using NLP with the following process: (1) extracting and preprocessing of unique factor names; (2) assessing factor similarity using various language models; (3) determining optimal merging threshold maximising the F1-score; (4) merging the factors of the 13 CLDs based on the selected threshold. Overall sentence-transformer models performed better compared to word2vec, average word embeddings and Jaccard similarity. Of 161,182 comparisons, 1,123 with a score above 0.7 given by sentence-transformer models were analysed by the subject matter experts. Paraphrase-multilingual-mpnet-base-v2 had the highest F1-score of 0.68 and was used to merge the factors with a threshold of 0.75. From 592 factors, 344 were merged into 66 groups. Utilizing language models facilitates identification of similar factors and has potential to aid researchers in constructing CLDs whilst reducing the time required to manually merge them. While models accurately merge synonymous or closely related factors, manual intervention may be required for specific cases.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2458 1471-2458
DOI:	10.1186/s12889-025-22142-3