Tweedie gradient boosting for extremely unbalanced zero-inflated data
Tweedie's compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution. In particular, it is not uncommon to have extremely unbalanced data with excessively large proportion of zero claims, and even traditio...
Saved in:
Published in | Communications in statistics. Simulation and computation Vol. 51; no. 9; pp. 5507 - 5529 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Philadelphia
Taylor & Francis
27.09.2022
Taylor & Francis Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Tweedie's compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution. In particular, it is not uncommon to have extremely unbalanced data with excessively large proportion of zero claims, and even traditional Tweedie model may not be satisfactory for fitting the data. In this paper, we propose a boosting-assisted zero-inflated Tweedie model, called EMTboost, that allows zero probability mass to exceed a traditional model. We makes a nonparametric assumption on its Tweedie model component, that unlike a linear model, is able to capture nonlinearities, discontinuities, and complex higher order interactions among predictors. A specialized Expectation-Maximization algorithm is developed that integrates a blockwise coordinate descent strategy and a gradient tree-boosting algorithm to estimate key model parameters. We use extensive simulation and data analysis on synthetic zero-inflated auto-insurance claim data to illustrate our method's prediction performance. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0361-0918 1532-4141 |
DOI: | 10.1080/03610918.2020.1772302 |