Automating mixture model fitting of task durations for process conformance checking

Process task duration data often exhibit multiple peaks, indicating differences in, for example, customer ages and preferences, resource capabilities or the day/hour of a week. This heterogeneous data, which captures diverse customer patterns, should be represented using different models, resulting...

Full description

Saved in:

Bibliographic Details
Published in	Data mining and knowledge discovery Vol. 39; no. 5; p. 53
Main Authors	Yang, Lingkai, McClean, Sally, Faddy, Malcolm, Donnelly, Mark, Khan, Kashaf, Burke, Kevin
Format	Journal Article
Language	English
Published	New York Springer US 01.09.2025 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Automation Business operations Chemistry and Earth Sciences Computer Science Customers Data Mining and Knowledge Discovery Datasets Efficiency Hospitals Information Storage and Retrieval Methods Physics Statistics for Engineering Process mining Divide-and-conquer fitting Process duration modelling Nelder-Mead optimisation Process conformance checking Gamma mixture model
Online Access	Get full text
ISSN	1384-5810 1573-756X
DOI	10.1007/s10618-025-01131-5

Cover

Loading…

More Information
Summary:	Process task duration data often exhibit multiple peaks, indicating differences in, for example, customer ages and preferences, resource capabilities or the day/hour of a week. This heterogeneous data, which captures diverse customer patterns, should be represented using different models, resulting in an overall mixture model. This paper introduces gamma mixture models to represent various customer patterns in task duration data, with a focus on automating the fitting process. The approach involves a two-stage procedure: first, divide-and-conquer using peak-, equidistance- and cluster-based techniques to partition data, and automatically fit gamma distributions to each subset. The second stage then improves the fitted mixture model by directly searching the log-likelihood surface. The method is compared with the expectation–maximization (EM) algorithm and an open tool (HyperStar), using both artificially generated datasets and a publicly available hospital billing dataset, demonstrating its effectiveness and time efficiency in modelling heterogeneous process duration data. Furthermore, a case study on process conformance checking is conducted using the hospital billing dataset, highlighting a potential application area for the method in process mining.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-025-01131-5