SP-TSRM: A Data Grouping Strategy in Distributed Storage System

With the development of smart devices and social media, massive unstructured data is uploaded to distributed storage systems. Since the characteristics of multi-users and high concurrency the unstructured data accesses have, it brings new challenges to traditional distributed storage systems designe...

Full description

Saved in:
Bibliographic Details
Published inAlgorithms and Architectures for Parallel Processing Vol. 11334; pp. 524 - 531
Main Authors Zhu, Dongjie, Du, Haiwen, Cao, Ning, Qiao, Xueming, Liu, Yanyan
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2018
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the development of smart devices and social media, massive unstructured data is uploaded to distributed storage systems. Since the characteristics of multi-users and high concurrency the unstructured data accesses have, it brings new challenges to traditional distributed storage systems designed for large files. We propose a grouping strategy to analyze relevant data in access according to disk access logs in the real distributed storage systems environment. When any data in the group is accessed, the whole group is prefetched from disk to the cache. Firstly, we conduct statistical analysis on the access logs and propose a preliminary classification method to classify files in spatiotemporal locality. Secondly, a strength-priority tree structure relation model (SP-TSRM) is proposed to mine file group efficiently. Finally, experiments show that the proposed model can improve the cache hit rate significantly, thereby improving the read efficiency of distributed storage systems.
ISBN:3030050505
9783030050504
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-05051-1_36