Forecast aggregation via recalibration

It is known that the average of many forecasts about a future event tends to outperform the individual assessments. With the goal of further improving forecast performance, this paper develops and compares a number of models for calibrating and aggregating forecasts that exploit the well-known fact...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning Vol. 95; no. 3; pp. 261 - 289
Main Authors	Turner, Brandon M., Steyvers, Mark, Merkle, Edgar C., Budescu, David V., Wallsten, Thomas S.
Format	Journal Article
Language	English
Published	New York Springer US 01.06.2014 Springer Nature B.V
Subjects	Agglomeration Aggregates Artificial Intelligence Assessments Calibration Computer Science Control Forecasting Judgments Machine learning Mathematical models Mechatronics Natural Language Processing (NLP) Performance enhancement Robotics Simulation and Modeling Aggregation Hierarchical Bayesian models Wisdom of the crowd Systematic distortions Individual differences Calibration Forecasting
Online Access	Get full text

Cover

Loading…

More Information
Summary:	It is known that the average of many forecasts about a future event tends to outperform the individual assessments. With the goal of further improving forecast performance, this paper develops and compares a number of models for calibrating and aggregating forecasts that exploit the well-known fact that individuals exhibit systematic biases during judgment and elicitation. All of the models recalibrate judgments or mean judgments via a two-parameter calibration function, and differ in terms of whether (1) the calibration function is applied before or after the averaging, (2) averaging is done in probability or log-odds space, and (3) individual differences are captured via hierarchical modeling. Of the non-hierarchical models, the one that first recalibrates the individual judgments and then averages them in log-odds is the best relative to simple averaging, with 26.7 % improvement in Brier score and better performance on 86 % of the individual problems. The hierarchical version of this model does slightly better in terms of mean Brier score (28.2 %) and slightly worse in terms of individual problems (85 %).
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-013-5401-4