LSII: An indexing structure for exact real-time search on microblogs

Indexing microblogs for real-time search is challenging given the efficiency issue caused by the tremendous speed at which new microblogs are created by users. Existing approaches address this efficiency issue at the cost of query accuracy, as they either (i) exclude a significant portion of microbl...

Full description

Saved in:
Bibliographic Details
Published in2013 IEEE 29th International Conference on Data Engineering (ICDE) pp. 482 - 493
Main Authors Lingkun Wu, Wenqing Lin, Xiaokui Xiao, Yabo Xu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Indexing microblogs for real-time search is challenging given the efficiency issue caused by the tremendous speed at which new microblogs are created by users. Existing approaches address this efficiency issue at the cost of query accuracy, as they either (i) exclude a significant portion of microblogs from the index to reduce update cost or (ii) rank microblogs mostly by their timestamps (without sufficient consideration of their relevance to the queries) to enable append-only index insertion. As a consequence, the search results returned by the existing approaches do not satisfy the users who demand timely and high-quality search results. To remedy this deficiency, we propose the Log-Structured Inverted Indices (LSII), a structure for exact real-time search on microblogs. The core of LSII is a sequence of inverted indices with exponentially increasing sizes, such that new microblogs are (i) first inserted into the smallest index and (ii) later moved into the larger indices in a batch manner. The batch insertion mechanism leads to a small amortize update cost for each new microblog, without significantly degrading query performance. We present a comprehensive study on LSII, exploring various design options to strike a good balance between query and update performance. In addition, we propose extensions of LSII to support personalized search and to exploit multi-threading for performance improvement. Extensive experiments demonstrate the efficiency of LSII with experiments on real data.
ISBN:9781467349093
1467349097
ISSN:1063-6382
2375-026X
DOI:10.1109/ICDE.2013.6544849