One-step Gibbs sampling for the generation of synthetic households

The generation of synthetic households is challenging due to the necessity of maintaining consistency between the two layers of interest: the household itself, and the individuals composing it. Hence, the problem is typically tackled in two steps, first focusing on the individual layer and then on t...

Full description

Saved in:
Bibliographic Details
Published inTransportation research. Part C, Emerging technologies Vol. 166; p. 104770
Main Authors Kukic, Marija, Li, Xinling, Bierlaire, Michel
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The generation of synthetic households is challenging due to the necessity of maintaining consistency between the two layers of interest: the household itself, and the individuals composing it. Hence, the problem is typically tackled in two steps, first focusing on the individual layer and then on the household layer. The existing two-step simulation method proposes generating the households based on their roles which diminishes the generality of the approach and makes it difficult to reproduce despite its beneficial properties. In this paper, we propose an alternative extension of Gibbs sampling for generating hierarchical datasets such as synthetic households, in order to make simulation more general and reusable. We demonstrate the performance of our method in a case study based on the 2015 Swiss micro-census data and compare it against state-of-the-art approaches. We show the influence of modeling decisions on different performance metrics and how the analyst can easily enforce consistency while avoiding generating illogical households. We show that the algorithm maintains the conditional distributions while satisfying the marginals of all variables simultaneously, all while generating consistent synthetic households. •We propose an alternative practical extension of Gibbs sampling for generating hierarchical datasets such as synthetic households.•By grouping all relevant variables at all levels into a single random vector and sorting individuals by decreasing age, we can generate realistically detailed synthetic populations.•Implementing a separate Gibbs sampler for each household size accelerates the generation process and can be viewed as a variance reduction technique.•We demonstrate how modeling decisions impact various performance metrics and how analysts can ensure consistency while avoiding the creation of illogical households.•The results indicate that model-based methods are superior to data-driven approaches in controlling the generation process, thereby preventing the creation of illogical households.
ISSN:0968-090X
DOI:10.1016/j.trc.2024.104770