Causal Knowledge in Data Fusion Subject to Latent Confounding and Measurement Error

Data fusion is the process of integrating data from multiple sources to produce more accurate and reliable information. It is often the case that data are subject to latent confounding and measurement error in real-world scenarios. In this paper, we evaluate fusion strategies based on different leve...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems pp. 1 - 8
Main Authors	Yu, Jingyi, Pychynski, Tim, Huber, Marco F.
Format	Conference Proceeding
Language	English
Published	IEEE 04.09.2024
Subjects	Closed box Data integration Data models Measurement errors Noise Optimization Predictive models Reliability Sensors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data fusion is the process of integrating data from multiple sources to produce more accurate and reliable information. It is often the case that data are subject to latent confounding and measurement error in real-world scenarios. In this paper, we evaluate fusion strategies based on different levels of contained causal knowledge to solve quality prediction under varied conditions of latent confounding and measurement error. We show that the machine learning-based fusion strategy achieves the best prediction quality when data are independent and identically distributed (i.i.d.). However, in the presence of latent confounding, the causality-based fusion strategy makes prediction models more robust against severe distribution shifts. Moreover, the out-of-distribution (OOD) generalizability of prediction models is also affected by measurement error in the data. If causal knowledge needs to be inferred from data by applying causal discovery methods, we demonstrate that measurement error can adversely impair causal discovery. We advocate that caution needs to be exercised when using standard causal discovery methods if the circumstances under which the data were generated are unknown.
ISSN:	2767-9357
DOI:	10.1109/MFI62651.2024.10705789