Synthetic Training-Data Generation for ML-based Process Mining Tools

This work addresses the challenge of data scarcity in process mining by proposing the creation of synthetic training data using generative models. A comparative analysis is conducted between a Long Short-Term Memory (LSTM) model and the Generative Adversarial Network (GAN) model, using two distinct...

Full description

Saved in:
Bibliographic Details
Published in2024 14th International Conference on Advanced Computer Information Technologies (ACIT) pp. 705 - 709
Main Authors Singh, Anjali, Bettouche, Zineddine, Fischer, Andreas
Format Conference Proceeding
LanguageEnglish
Published IEEE 19.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This work addresses the challenge of data scarcity in process mining by proposing the creation of synthetic training data using generative models. A comparative analysis is conducted between a Long Short-Term Memory (LSTM) model and the Generative Adversarial Network (GAN) model, using two distinct datasets. Multiple evaluation methods are employed to compare the results from the two models based on: precision, fidelity, diversity, and novelty. Results indicate that while LSTM accurately reproduces the initial data structure, GAN introduces more variability, offering a wider range of training scenarios. This highlights the potential of GAN-generated data to enhance the effectiveness and reliability of machine learning-based process mining tools
ISSN:2770-5226
DOI:10.1109/ACIT62333.2024.10712516