A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks

Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data par...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on cybernetics Vol. 45; no. 12; pp. 2890 - 2904
Main Authors	Yue, Kun, Fang, Qiyu, Wang, Xiaoling, Li, Jin, Liu, Weiyi
Format	Journal Article
Language	English
Published	United States IEEE 01.12.2015 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithm design and analysis Algorithms Bayesian analysis Bayesian network learning Cloud computing Computation Computational modeling Data models data-intensive computing Distributed databases Dynamic characteristics Heuristic algorithms incremental learning Learning MapReduce Mathematical models parallel algorithm Parallel algorithms Probability Scoring uncertain knowledge uncertain knowledge Bayesian network learning data-intensive computing incremental learning parallel algorithm MapReduce
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by <;key, value> pairs. Further, in view of the dynamic characteristics of the changing data, we give the concept of influence degree to measure the coincidence of the current BN with new data, and then propose the corresponding two-pass MapReduce-based algorithms for BNs incremental learning. Experimental results show the efficiency, scalability, and effectiveness of our methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2015.2388791