Analyzing factors and interaction terms affecting urban fatal crash types based on a hybrid framework of econometric model and machine learning approaches

The discrete outcome model is an important method for analyzing the factors affecting crash outcomes. However, the lack of effective approaches for discretizing continuous variables and mining interaction terms are two important problems confronted by such models. To address the above issues, this p...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of crashworthiness Vol. 28; no. 6; pp. 809 - 821
Main Authors Hu, Zongpin, Shi, Qin, Chen, Yikai, Yuan, Quan, Tao, Zhengbin, Bian, Yujie, Haque, Md. Mazharul
Format Journal Article
LanguageEnglish
Published Cambridge Taylor & Francis 02.11.2023
Taylor & Francis Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The discrete outcome model is an important method for analyzing the factors affecting crash outcomes. However, the lack of effective approaches for discretizing continuous variables and mining interaction terms are two important problems confronted by such models. To address the above issues, this paper proposes a hybrid approach combining machine learning and econometric modelling to investigate fatal crash types in Shenzhen, China. First, the fatal crash data were collected from 2014 to 2016 in Shenzhen. Second, the minimum description length principle (MDLP), an outstanding representative of supervised discretization algorithms, was used for the discretization of continuous variables in the data. This algorithm selects the proper cut-point through the minimization of the entropy for the given interval. Subsequently, the feature subset selection algorithm based on association rule mining (FEAST), which has advantages over other interaction-mining algorithms in terms of structure freedom and the global search capability, was employed to mine the interaction effects between variables. Finally, the discretized continuous variables and the interaction terms were incorporated into the random parameters logit (RPL) model. Results reveal that the goodness of fit of the MDLP-FEAST-RPL model proposed in this paper is significantly better than that of the equal width discretization (EWD)-RPL, MDLP-RPL, and EWD-FEAST-RPL models. In addition, a total of eleven factors and interaction terms are associated with urban fatal crash types. These findings will facilitate the development of cost-effective policies or countermeasures for targeted crash types in large cities of developing countries.
ISSN:1358-8265
1573-8965
1754-2111
DOI:10.1080/13588265.2022.2130621