Collaborative Data Analytics with DataHub

While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for , especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the VLDB Endowment Vol. 8; no. 12; p. 1916
Main Authors Bhardwaj, Anant, Karger, David, Subramanyam, Harihar, Deshpande, Amol, Madden, Sam, Wu, Eugene, Elmore, Aaron, Parameswaran, Aditya, Zhang, Rebecca
Format Journal Article
LanguageEnglish
Published United States 01.08.2015
Online AccessGet more information

Cover

Loading…
More Information
Summary:While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for , especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) : multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) : conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) , : conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the - an IPython-based notebook for analyzing data and storing the results of data analysis.
ISSN:2150-8097
2150-8097
DOI:10.14778/2824032.2824100