Synthetic Training-Data Generation for ML-based Process Mining Tools

This work addresses the challenge of data scarcity in process mining by proposing the creation of synthetic training data using generative models. A comparative analysis is conducted between a Long Short-Term Memory (LSTM) model and the Generative Adversarial Network (GAN) model, using two distinct...

Full description

Saved in:

Bibliographic Details
Published in	2024 14th International Conference on Advanced Computer Information Technologies (ACIT) pp. 705 - 709
Main Authors	Singh, Anjali, Bettouche, Zineddine, Fischer, Andreas
Format	Conference Proceeding
Language	English
Published	IEEE 19.09.2024
Subjects	Analytical models Data models Data structures event logs Generative adversarial networks generative adversarial networks (GAN) Long short term memory long short-term memory (LSTM) networks Process mining Reliability Synthetic data synthetic data generation synthetic log generation Training Training data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This work addresses the challenge of data scarcity in process mining by proposing the creation of synthetic training data using generative models. A comparative analysis is conducted between a Long Short-Term Memory (LSTM) model and the Generative Adversarial Network (GAN) model, using two distinct datasets. Multiple evaluation methods are employed to compare the results from the two models based on: precision, fidelity, diversity, and novelty. Results indicate that while LSTM accurately reproduces the initial data structure, GAN introduces more variability, offering a wider range of training scenarios. This highlights the potential of GAN-generated data to enhance the effectiveness and reliability of machine learning-based process mining tools
ISSN:	2770-5226
DOI:	10.1109/ACIT62333.2024.10712516