Enabling Real Time Analytics over Raw XML Data

The data generated by many applications is in semi structured format, such as XML. This data can be used for analytics only after shredding and storing it in structured format. This process is known as Extract-Transform-Load or ETL. However, ETL process is often time consuming due to which crucial t...

Full description

Saved in:
Bibliographic Details
Published inReal-Time Business Intelligence and Analytics pp. 113 - 132
Main Authors Agarwal, Manoj K., Ramamritham, Krithi, Agarwal, Prashant
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing
SeriesLecture Notes in Business Information Processing
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The data generated by many applications is in semi structured format, such as XML. This data can be used for analytics only after shredding and storing it in structured format. This process is known as Extract-Transform-Load or ETL. However, ETL process is often time consuming due to which crucial time-sensitive insights can be lost or they may become un-actionable. Hence, this paper poses the following question: How do we expose analytical insights in the raw XML data? We address this novel problem by discovering additional information from the raw semi-structured data repository, called complementary information (CI), for a given user query. Experiments with real as well as synthetic data show that the discovered CI is relevant in the context of the given user query, nontrivial, and has high precision. The recall is also found to be high for most queries. Crowd-sourced feedback on the discovered CI corroborates these findings, showing that our system is able to discover highly relevant and potentially useful CI in real-world XML data repositories. Concepts behind our technique are generic and can be used for other semi-structured data formats as well.
ISBN:3030241238
9783030241230
ISSN:1865-1348
1865-1356
DOI:10.1007/978-3-030-24124-7_8