Visual Bayesian fusion to navigate a data lake

The evolution from traditional business intelligence to big data analytics has witnessed the emergence of `Data Lakes' in which data is ingested in raw form rather than into traditional data warehouses. With the increasing availability of many more pieces of information about each entity of int...

Full description

Saved in:
Bibliographic Details
Published in2016 19th International Conference on Information Fusion (FUSION) pp. 987 - 994
Main Authors Singh, Karamjit, Paneri, Kaushal, Pandey, Aditeya, Gupta, Garima, Sharma, Geetika, Agarwal, Puneet, Shroff, Gautam
Format Conference Proceeding
LanguageEnglish
Published ISIF 01.07.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The evolution from traditional business intelligence to big data analytics has witnessed the emergence of `Data Lakes' in which data is ingested in raw form rather than into traditional data warehouses. With the increasing availability of many more pieces of information about each entity of interest, e.g., a customer, often from diverse sources (social-media, mobility, internet-of-things), fusing, visualizing and deriving insights from such data pose a number of challenges: First, disparate datasets often lack a natural join key. Next, datasets may describe measures at different levels of granularity, e.g., individual vs. aggregate data, and finally, different datasets may be derived from physically distinct populations. Moreover, once data has been fused, queries are often an inefficient and inaccurate mechanism to derive insight from high-dimensional data. In this paper we describe iFuse, a data-fusion based visual analytics platform for navigating a data lake to derive insights. We rely on Bayesian graphical models to provide useful rudder with which to fuse and analyze disparate islands of data in a systematic manner. Our platform allows for rich interactive visualizations, querying and keyword-based search within and across datasets or models, as well as intuitive visual interfaces for value-imputation or model-based predictions. We illustrate the use of our platform in multiple scenarios, including two public data challenges as well as a real-life industry use-case involving the probabilistic fusion of datasets that lack a natural join-key.