A scalable graph generation algorithm to sample over a given shell distribution

Graphs are commonly used to model the relationships between various entities. These graphs can be enormously large and thus, scalable graph analysis has been the subject of many research efforts. To enable scalable analytics, many researchers have focused on generating realistic graphs that support...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 227 - 236
Main Authors Ozkaya, M. Yusuf, Balin, M. Fatih, Pinar, Ali, Catalyurek, Umit V.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Graphs are commonly used to model the relationships between various entities. These graphs can be enormously large and thus, scalable graph analysis has been the subject of many research efforts. To enable scalable analytics, many researchers have focused on generating realistic graphs that support controlled experiments for understanding how algorithms perform under changing graph features. Significant progress has been made on scalable graph generation which preserve some important graph properties (e.g., degree distribution, clustering coefficients). In this paper, we study how to sample a graph from the space of graphs with a given shell distribution. Shell distribution is related to the k-core, which is the largest subgraph where each vertex is connected to at least kother vertices. A k-shell is the subset of vertices that are in k-core but not ( k +1)-core, and the shell distribution comprises the sizes of these shells. Core decompositions are widely used to extract information from graphs and to assist other computations. We present a scalable shared and distributed memory graph generator that, given a shell decomposition, generates a random graph that conforms to it. Our extensive experimental results show the efficiency and scalability of our methods. Our algorithm generates 2 ^{33} vertices and 2 ^{37} edges in less than 50 seconds on 384 cores. 1 1 This work is funded by the Laboratory Directed Research and Development program of Sandia National Laboratories. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.
DOI:10.1109/IPDPSW50202.2020.00051