Relieving and Readjusting Pythagoras
Bill James invented the Pythagorean expectation in the late 70's to predict a baseball team's winning percentage knowing just their runs scored and allowed. His original formula estimates a winning percentage of \({\rm RS}^2/({\rm RS}^2+{\rm RA}^2)\), where \({\rm RS}\) stands for runs sco...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
17.06.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Bill James invented the Pythagorean expectation in the late 70's to predict a baseball team's winning percentage knowing just their runs scored and allowed. His original formula estimates a winning percentage of \({\rm RS}^2/({\rm RS}^2+{\rm RA}^2)\), where \({\rm RS}\) stands for runs scored and \({\rm RA}\) for runs allowed; later versions found better agreement with data by replacing the exponent 2 with numbers near 1.83. Miller and his colleagues provided a theoretical justification by modeling runs scored and allowed by independent Weibull distributions. They showed that a single Weibull distribution did a very good job of describing runs scored and allowed, and led to a predicted won-loss percentage of \(({\rm RS_{\rm obs}}-1/2)^\gamma / (({\rm RS_{\rm obs}}-1/2)^\gamma + ({\rm RA_{\rm obs}}-1/2)^\gamma)\), where \({\rm RS_{\rm obs}}\) and \({\rm RA_{\rm obs}}\) are the observed runs scored and allowed and \(\gamma\) is the shape parameter of the Weibull (typically close to 1.8). We show a linear combination of Weibulls more accurately determines a team's run production and increases the prediction accuracy of a team's winning percentage by an average of about 25% (thus while the currently used variants of the original predictor are accurate to about four games a season, the new combination is accurate to about three). The new formula is more involved computationally; however, it can be easily computed on a laptop in a matter of minutes from publicly available season data. It performs as well (or slightly better) than the related Pythagorean formulas in use, and has the additional advantage of having a theoretical justification for its parameter values (and not just an optimization of parameters to minimize prediction error). |
---|---|
ISSN: | 2331-8422 |