GenCAT: Generating attributed graphs with controlled relationships between classes, attributes, and topology

Generating large synthetic attributed graphs with node labels is an important task to support various experimental studies for graph analytic methods. Existing graph generators fail to simultaneously simulate core/border and homophily/heterophily phenomena which real-world graphs exhibit, i.e., the...

Full description

Saved in:
Bibliographic Details
Published inInformation systems (Oxford) Vol. 115; p. 102195
Main Authors Maekawa, Seiji, Sasaki, Yuya, Fletcher, George, Onizuka, Makoto
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.05.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Generating large synthetic attributed graphs with node labels is an important task to support various experimental studies for graph analytic methods. Existing graph generators fail to simultaneously simulate core/border and homophily/heterophily phenomena which real-world graphs exhibit, i.e., the relationships between labels, attributes, and topology. Motivated by this limitation, we propose GenCAT, an attributed graph generator for controlling those relationships, which has the following advantages. (i) GenCAT generates graphs with user-specified node degrees and flexibly controls the relationship between nodes and labels by incorporating the connection proportion for each node to classes. (ii) Generated attribute values follow user-specified distributions, and users can flexibly control the correlation between the attributes and labels. (iii) Graph generation scales linearly to the number of edges. GenCAT is the first generator to support all three of these practical features, i.e., it can capture both core/border and homophily/heterophily phenomena while ensuring its scalability. Through extensive experiments, we demonstrate that GenCAT can efficiently generate high-quality complex attributed graphs with user-controlled relationships between labels, attributes, and topology. •We tackle a synthetic attributed graph generation problem considering node labels.•Our generator allows users to control the connection proportion for each node.•Generated attributes follow user-specified features.•Graph generation is scalable and graphs with billion edges can be generated.•Our generator precisely reproduces graphs from a given real graph.
ISSN:0306-4379
1873-6076
DOI:10.1016/j.is.2023.102195