Fast Distributed PageRank Computation

Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google’s search engine). In distributed computing alone, PageRank vectors, or...

Full description

Saved in:
Bibliographic Details
Published inDistributed Computing and Networking Vol. 7730; pp. 11 - 26
Main Authors Das Sarma, Atish, Molla, Anisur Rahaman, Pandurangan, Gopal, Upfal, Eli
Format Book Chapter
LanguageEnglish
Published Germany Springer Berlin / Heidelberg 2012
Springer Berlin Heidelberg
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783642356674
3642356672
ISSN0302-9743
1611-3349
DOI10.1007/978-3-642-35668-1_2

Cover

Loading…
More Information
Summary:Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google’s search engine). In distributed computing alone, PageRank vectors, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRank in general graphs and prove strong bounds on the round complexity. We first present an algorithm that takes O(logn/ε) rounds with high probability on any graph (directed or undirected), where n is the network size and ε is the reset probability used in the PageRank computation (typically ε is a fixed constant). We then present a faster algorithm that takes $O(\sqrt{\log n}/{\epsilon})$ rounds in undirected graphs. Both of the above algorithms are scalable, as each node processes and sends only small (polylogarithmic in n, the network size) number of bits per round and hence work in the CONGEST distributed computing model. For directed graphs, we present an algorithm that has a running time of $O(\sqrt{\log n/{\epsilon}})$ , but it requires a polynomial number of bits to processed and sent per node in a round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vectors with provably efficient running time.
Bibliography:Original Abstract: Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google’s search engine). In distributed computing alone, PageRank vectors, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRank in general graphs and prove strong bounds on the round complexity. We first present an algorithm that takes O(logn/ε) rounds with high probability on any graph (directed or undirected), where n is the network size and ε is the reset probability used in the PageRank computation (typically ε is a fixed constant). We then present a faster algorithm that takes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(\sqrt{\log n}/{\epsilon})$\end{document} rounds in undirected graphs. Both of the above algorithms are scalable, as each node processes and sends only small (polylogarithmic in n, the network size) number of bits per round and hence work in the CONGEST distributed computing model. For directed graphs, we present an algorithm that has a running time of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(\sqrt{\log n/{\epsilon}})$\end{document}, but it requires a polynomial number of bits to processed and sent per node in a round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vectors with provably efficient running time.
ISBN:9783642356674
3642356672
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-642-35668-1_2