Prediction and behavioral analysis of travel mode choice: A comparison of machine learning and logit models

•A comprehensive comparison between machine learning and logit models is provided.•Random forest model has much higher predictive accuracy compared to multinomial logit model and mixed logit model.•Machine learning and logit models agree on many aspects of the behavioral outputs.•Applying a standard...

Full description

Saved in:
Bibliographic Details
Published inTravel, behaviour & society Vol. 20; no. C; pp. 22 - 35
Main Authors Zhao, Xilei, Yan, Xiang, Yu, Alan, Van Hentenryck, Pascal
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier Ltd 01.07.2020
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A comprehensive comparison between machine learning and logit models is provided.•Random forest model has much higher predictive accuracy compared to multinomial logit model and mixed logit model.•Machine learning and logit models agree on many aspects of the behavioral outputs.•Applying a standard approach to generate marginal effects and arc elasticities for random forest lead to unreasonable estimates.•Random forest captures nonlinear relationships between the input features and choice outcomes. Some recent studies have shown that machine learning can achieve higher predictive accuracy than logit models. However, existing studies rarely examine behavioral outputs (e.g., marginal effects and elasticities) that can be derived from machine-learning models and compare the results with those obtained from logit models. In other words, there has not been a comprehensive comparison between logit models and machine learning that covers both prediction and behavioral analysis, two equally important subjects in travel-behavior study. This paper addresses this gap by examining the key differences in model development, evaluation, and behavioral interpretation between logit and machine-learning models for mode-choice modeling. We empirically evaluate the two approaches using stated-preference survey data. Consistent with the literature, this paper finds that the best-performing machine-learning model, random forest, has significantly higher predictive accuracy than multinomial logit and mixed logit models. The random forest model and the two logit models largely agree on several aspects of the behavioral outputs, including variable importance and the direction of association between independent variables and mode choice. However, we find that the random forest model produces behaviorally unreasonable arc elasticities and marginal effects when these behavioral outputs are computed from a standard approach. After the introduction of some modifications that overcome the limitations of tree-based models, the results are improved to some extent. There appears to be a tradeoff between predictive accuracy and behavioral soundness when choosing between machine learning and logit models in mode-choice modeling.
Bibliography:USDOE
ISSN:2214-367X
DOI:10.1016/j.tbs.2020.02.003