GenCAT: Generating attributed graphs with controlled relationships between classes, attributes, and topology
Generating large synthetic attributed graphs with node labels is an important task to support various experimental studies for graph analytic methods. Existing graph generators fail to simultaneously simulate core/border and homophily/heterophily phenomena which real-world graphs exhibit, i.e., the...
Saved in:
Published in | Information systems (Oxford) Vol. 115; p. 102195 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.05.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Generating large synthetic attributed graphs with node labels is an important task to support various experimental studies for graph analytic methods. Existing graph generators fail to simultaneously simulate core/border and homophily/heterophily phenomena which real-world graphs exhibit, i.e., the relationships between labels, attributes, and topology. Motivated by this limitation, we propose GenCAT, an attributed graph generator for controlling those relationships, which has the following advantages. (i) GenCAT generates graphs with user-specified node degrees and flexibly controls the relationship between nodes and labels by incorporating the connection proportion for each node to classes. (ii) Generated attribute values follow user-specified distributions, and users can flexibly control the correlation between the attributes and labels. (iii) Graph generation scales linearly to the number of edges. GenCAT is the first generator to support all three of these practical features, i.e., it can capture both core/border and homophily/heterophily phenomena while ensuring its scalability. Through extensive experiments, we demonstrate that GenCAT can efficiently generate high-quality complex attributed graphs with user-controlled relationships between labels, attributes, and topology.
•We tackle a synthetic attributed graph generation problem considering node labels.•Our generator allows users to control the connection proportion for each node.•Generated attributes follow user-specified features.•Graph generation is scalable and graphs with billion edges can be generated.•Our generator precisely reproduces graphs from a given real graph. |
---|---|
ISSN: | 0306-4379 1873-6076 |
DOI: | 10.1016/j.is.2023.102195 |