Comparing Causal Bayesian Networks Estimated from Data

The knowledge of the causal mechanisms underlying one single system may not be sufficient to answer certain questions. One can gain additional insights from comparing and contrasting the causal mechanisms underlying multiple systems and uncovering consistent and distinct causal relationships. For ex...

Full description

Saved in:
Bibliographic Details
Published inEntropy (Basel, Switzerland) Vol. 26; no. 3; p. 228
Main Authors Ma, Sisi, Tourani, Roshan
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 02.03.2024
MDPI
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The knowledge of the causal mechanisms underlying one single system may not be sufficient to answer certain questions. One can gain additional insights from comparing and contrasting the causal mechanisms underlying multiple systems and uncovering consistent and distinct causal relationships. For example, discovering common molecular mechanisms among different diseases can lead to drug repurposing. The problem of comparing causal mechanisms among multiple systems is non-trivial, since the causal mechanisms are usually unknown and need to be estimated from data. If we estimate the causal mechanisms from data generated from different systems and directly compare them (the naive method), the result can be sub-optimal. This is especially true if the data generated by the different systems differ substantially with respect to their sample sizes. In this case, the quality of the estimated causal mechanisms for the different systems will differ, which can in turn affect the accuracy of the estimated similarities and differences among the systems via the naive method. To mitigate this problem, we introduced the bootstrap estimation and the equal sample size resampling estimation method for estimating the difference between causal networks. Both of these methods use resampling to assess the confidence of the estimation. We compared these methods with the naive method in a set of systematically simulated experimental conditions with a variety of network structures and sample sizes, and using different performance metrics. We also evaluated these methods on various real-world biomedical datasets covering a wide range of data designs.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1099-4300
1099-4300
DOI:10.3390/e26030228