Composite leading search index: a preprocessing method of internet search data for stock trends prediction

Previous studies have revealed that Internet search data is a new source of data that can be used to predict the stock market. In this new, data-driven research field, choosing a method for preprocessing data is crucial to achieving accurate prediction performance. This paper proposes a preprocessin...

Full description

Saved in:
Bibliographic Details
Published inAnnals of operations research Vol. 234; no. 1; pp. 77 - 94
Main Authors Liu, Ying, Chen, Yibing, Wu, Sheng, Peng, Geng, Lv, Benfu
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2015
Springer
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Previous studies have revealed that Internet search data is a new source of data that can be used to predict the stock market. In this new, data-driven research field, choosing a method for preprocessing data is crucial to achieving accurate prediction performance. This paper proposes a preprocessing method of Internet search data: composite leading search index (CLSI), which is composed of three steps: (a) keyword selection, (b) time difference measurement, and (c) leading index composition. We demonstrate the validity of CLSI by comparing this method’s results with the results from search volume index (SVI), which is most commonly used in previous literatures. We build a time series model (TS) with error correction and support vector regression (SVR) for stock trend prediction, and combine into four models for comparison: SVI–TS, CLSI–TS, SVI–SVR, and CLSI–SVR. We test these four models in the context of the Chinese stock market, which interests more and more investors nowadays, and analyzed results in nine datasets: stable periods, peak periods and trough periods of Shanghai Composite Index, Shenzhen Composite Index, and Hushen 300 index respectively. The results show that using TS and SVR as forecasting models, CLSI performs better than SVI on majority of the test dataset while has almost the same performance with that of SVI on the remaining test dataset. It is to some extent convincing that CLSI is a more efficient preprocessing method of Internet search data for stock trend prediction.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0254-5330
1572-9338
DOI:10.1007/s10479-014-1779-z