XGBoost meets TabNet in Predicting the Costs of Forwarding Contracts

XGBoost and other gradient boosting frameworks are usually the default choice for solving classification and regression problems for tabular data, especially in data science competitions, as they often, combined with proper data pre-processing and feature engineering, supply high accuracy of predict...

Full description

Saved in:

Bibliographic Details
Published in	2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS) Vol. 30; pp. 417 - 420
Main Author	Lewandowska, Aleksandra
Format	Conference Proceeding Journal Article
Language	English
Published	Polish Information Processing Society 01.01.2022
Subjects	Computational modeling Data models Data science Deep architecture Prediction algorithms Predictive models Training
Online Access	Get full text
ISSN	2300-5963
DOI	10.15439/2022F294

Cover

Loading…

More Information
Summary:	XGBoost and other gradient boosting frameworks are usually the default choice for solving classification and regression problems for tabular data, especially in data science competitions, as they often, combined with proper data pre-processing and feature engineering, supply high accuracy of predictions. They are also fast to learn, easy to tune, and can supply a ranking of variables, making interpretation of learned models easier. On the other hand, deep networks are the top choice for complex data, such as text, audio, or images. However, despite the many successful applications of deep networks, they are not yet prevalent on tabular ones. It may be related to difficulties in the choice of the proper architecture and its parameters. A solution to this problem may be found in recent works on deep architectures dedicated to tabular data, such as TabNet, which has recently been reported to achieve comparable or even better accuracy than XGBoost on some tabular datasets. In this paper, we compare XGBoost with TabNet in the context of the FedCSIS 2022 challenge, aimed at predicting forwarding contracts based on contract data and planned routes. The data has a typical tabular form, described by a multidimensional vector of numeric and nominal features. Of particular interest is investigating whether aggregation of predictions derived from XGBoost and TabNet could produce better results than either algorithm alone. The paper discusses the competition solution and shows some added experiments comparing XGBoost with TabNet on competition data, including incremental model re-building and parameter tuning. The experiments showed that the XGBoost and TabNet ensemble is a promising solution for building predictive models for tabular data. In the tests conducted, such an ensemble achieved a lower prediction error than each of the algorithms individually.
ISSN:	2300-5963
DOI:	10.15439/2022F294