一种高维大数据全k近邻查询算法
全k近邻(allk-nearest neighbor,AkNN)查询,是k近邻查询的一个变型,旨在在一个查询过程中为给定数据集的每个对象确定k个最近邻。提出了一种在Hadoop分布式平台下处理高维大数据的AkNN查询算法。首先使用行条化思想结合p-stableLSH算法将高维数据对象降维,然后结合空间填充曲线Z-order的优良特性。把降维后的数据嵌入一维空间中,接着进行范围查询。整个过程使用MapReduce框架分布式并行处理。实验结果表明,所提出的算法可以高效处理高维大数据的AkNN查询。...
Saved in:
Published in | 电信科学 Vol. 31; no. 7; pp. 52 - 62 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
中国通信学会
01.07.2015
人民邮电出版社有限公司 宁波大学信息科学与工程学院 宁波315211 |
Subjects | |
Online Access | Get full text |
ISSN | 1000-0801 |
DOI | 10.11959/j.issn.1000-0801.2015171 |
Cover
Summary: | 全k近邻(allk-nearest neighbor,AkNN)查询,是k近邻查询的一个变型,旨在在一个查询过程中为给定数据集的每个对象确定k个最近邻。提出了一种在Hadoop分布式平台下处理高维大数据的AkNN查询算法。首先使用行条化思想结合p-stableLSH算法将高维数据对象降维,然后结合空间填充曲线Z-order的优良特性。把降维后的数据嵌入一维空间中,接着进行范围查询。整个过程使用MapReduce框架分布式并行处理。实验结果表明,所提出的算法可以高效处理高维大数据的AkNN查询。 |
---|---|
Bibliography: | high-dimensional, AkNN, MapReduce, banding, locality sensitive hashing, Z-order A new variant of k nearest neighbor queries, which called as all k-nearest neighbor queries (AkNN), is a process to search the k nearest neighbors of each object in a data set. An AkNN query algorithm for high-dimensional big data on the Hadoop system was proposed. Using the banding technique and the p-stable LSH algorithm, dimensionality reduction was performed, then the data was embeded in a Z-order curve. The preproeessed data were continued to be treated on a MapReduce framework in a distributed parallel manner. Experimental results show that the proposed algorithm can efficiently handle AkNN queries for large-scale high-dimensional data. 11-2103/TN Wang Zhongwei, Chen Yefang, Xiao Siyou, Qian Jiangbo (Faculty of Electrical and Computer Science, Ningbo University, Ningbo 315211, China) |
ISSN: | 1000-0801 |
DOI: | 10.11959/j.issn.1000-0801.2015171 |