Large data sets and machine learning: Applications to statistical arbitrage

Machine learning algorithms and big data are transforming all industries including the finance and portfolio management sectors. While these techniques, such as Deep Belief Networks or Random Forests, are becoming more and more popular on the market, the academic literature is relatively sparse. Thr...

Full description

Saved in:
Bibliographic Details
Published inIDEAS Working Paper Series from RePEc
Main Author Huck, Nicolas
Format Paper
LanguageEnglish
Published St. Louis Federal Reserve Bank of St. Louis 01.01.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Machine learning algorithms and big data are transforming all industries including the finance and portfolio management sectors. While these techniques, such as Deep Belief Networks or Random Forests, are becoming more and more popular on the market, the academic literature is relatively sparse. Through a series of applications involving hundreds of variables/predictors and stocks, this article presents some of the state-of-the-art techniques and how they can be implemented to manage a long-short portfolio. Numerous practical and empirical issues are developed. One of the main questions beyond big data use is the value of information. Does an increase in the number of predictors improve the portfolio performance? Which features are the most important? A large number of predictors means, potentially, a high level of noise. How do the algorithms manage this? This article develops an application using a 22-year trading period, up to 300 U.S. large caps and around 600 predictors. The empirical results underline the ability of these techniques to generate useful trading signals for portfolios with important turnovers and short holding periods (one or five days). Positive excess returns are reported between 1993 and 2008. They are strongly reduced after accounting for transaction costs and traditional risk factors. When these machine learning tools were readily available in the market, excess returns turned into the negative in most recent times. Results also show that adding features is far from being a guarantee to boost the alpha of the portfolio.