Lake Data Warehouse Architecture for Big Data Solutions

Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, ‎subject-oriented, integrated, time-variant, and non-‎operational data. It is gathered from multiple ‎heterogeneous data ‎sources. We need to adapt traditional Data Warehouse architecture to deal with the new ‎challenges...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of advanced computer science & applications Vol. 11; no. 8
Main Authors Saddad, Emad, El-Bastawissy, Ali, M., Hoda, Hazman, Maryam
Format Journal Article
LanguageEnglish
Published West Yorkshire Science and Information (SAI) Organization Limited 2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, ‎subject-oriented, integrated, time-variant, and non-‎operational data. It is gathered from multiple ‎heterogeneous data ‎sources. We need to adapt traditional Data Warehouse architecture to deal with the new ‎challenges imposed by the abundance of data and the current big data characteristics, containing ‎volume, value, variety, validity, volatility, visualization, variability, and venue. The new ‎architecture also needs to handle existing drawbacks, including availability, scalability, and ‎consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake ‎Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to ‎overcome the challenges. ‎Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture ‎with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid ‎solution in a complementary way. The main advantage of the proposed architecture is that it ‎integrates the current features in ‎traditional Data Warehouses and big data features acquired ‎through integrating the ‎traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is ‎tailored to handle a tremendous ‎volume of data while maintaining availability, reliability, and ‎scalability.‎
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2158-107X
2156-5570
DOI:10.14569/IJACSA.2020.0110854