A data- and knowledge-driven framework for developing machine learning models to predict soccer match outcomes

The 2023 Soccer Prediction Challenge invited the machine learning community to develop innovative methods to predict the outcomes of 736 future soccer matches. The Challenge included two tasks. Task 1 was to forecast the exact match score , i.e., the number of goals scored by each team. Task 2 was t...

Full description

Saved in:
Bibliographic Details
Published inMachine learning Vol. 113; no. 10; pp. 8165 - 8204
Main Authors Berrar, Daniel, Lopes, Philippe, Dubitzky, Werner
Format Journal Article
LanguageEnglish
Published New York Springer US 01.10.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The 2023 Soccer Prediction Challenge invited the machine learning community to develop innovative methods to predict the outcomes of 736 future soccer matches. The Challenge included two tasks. Task 1 was to forecast the exact match score , i.e., the number of goals scored by each team. Task 2 was to predict the match outcome as probability vector over the three possible result categories: victory of the home team, draw, and victory of the away team. Here, we present a new data- and knowledge-driven framework for building machine learning models from readily available data to predict soccer match outcomes. A key component of this framework is an innovative approach to modeling interdependent time series data of competing entities. Using this framework, we developed various predictive models based on k -nearest neighbors, artificial neural networks, naive Bayes, and ordinal forests, which we applied to the two tasks of the 2023 Soccer Prediction Challenge. Among all submissions to the Challenge, our machine learning models based on k -nearest neighbors and neural networks achieved top performances. Our main insights from the Challenge are that relatively simple learning algorithms perform remarkably well compared to more complex algorithms, and that the key to successful predictions lies in how well soccer domain knowledge can be incorporated in the modeling process.
ISSN:0885-6125
1573-0565
DOI:10.1007/s10994-024-06625-9