Model-Based Diversification for Sequential Exploratory Queries

Today, data exploration platforms are widely used to assist users in locating interesting objects within large volumes of scientific and business data. In those platforms, users try to make sense of the underlying data space by iteratively posing numerous queries over large databases. While diversif...

Full description

Saved in:
Bibliographic Details
Published inData science and engineering Vol. 2; no. 2; pp. 151 - 168
Main Authors Khan, Hina A., Sharaf, Mohamed A
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.06.2017
Springer Nature B.V
SpringerOpen
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Today, data exploration platforms are widely used to assist users in locating interesting objects within large volumes of scientific and business data. In those platforms, users try to make sense of the underlying data space by iteratively posing numerous queries over large databases. While diversification of query results , like other data summarization techniques, provides users with quick insights into the huge query answer space, it adds additional complexity to an already computationally expensive data exploration task. To address this challenge, in this paper we propose a diversification scheme that targets the problem of efficiently diversifying the results of multiple queries within and across different data exploratory sessions. Our proposed scheme relies on a model-based diversification method and an ordered cache. In particular, we employ an adaptive regression model to estimate the diversity of a diverse subset. Such estimation of diversity value allows us to select diverse results without scanning all the query results. In order to further expedite the diversification process, we propose an order-based caching scheme to leverage the overlap between sequence of data exploration queries. Our extensive experimental evaluation on both synthetic and real data sets shows the significant benefits provided by our scheme as compared to the existing methods.
ISSN:2364-1185
2364-1541
DOI:10.1007/s41019-017-0038-0