Causal Inference on Multivariate and Mixed-Type Data

How can we discover whether X causes Y, or vice versa, that Y causes X, when we are only given a sample over their joint distribution? How can we do this such that X and Y can be univariate, multivariate, or of different cardinalities? And, how can we do so regardless of whether X and Y are of the s...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases Vol. 11052; pp. 655 - 671
Main Authors Marx, Alexander, Vreeken, Jilles
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2019
Springer International Publishing
SeriesLecture Notes in Computer Science
Online AccessGet full text
ISBN3030109275
9783030109271
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-10928-8_39

Cover

Loading…
More Information
Summary:How can we discover whether X causes Y, or vice versa, that Y causes X, when we are only given a sample over their joint distribution? How can we do this such that X and Y can be univariate, multivariate, or of different cardinalities? And, how can we do so regardless of whether X and Y are of the same, or of different data type, be it discrete, numeric, or mixed? These are exactly the questions we answer. We take an information theoretic approach, based on the Minimum Description Length principle, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. Simply put, if Y can be explained more succinctly by a set of classification or regression trees conditioned on X, than in the opposite direction, we conclude that X causes Y. Empirical evaluation on a wide range of data shows that our method, Crack, infers the correct causal direction reliably and with high accuracy on a wide range of settings, outperforming the state of the art by a wide margin. Code related to this paper is available at: http://eda.mmci.uni-saarland.de/crack.
Bibliography:Electronic supplementary materialThe online version of this chapter (https://doi.org/10.1007/978-3-030-10928-8_39) contains supplementary material, which is available to authorized users.
ISBN:3030109275
9783030109271
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-10928-8_39