Relieving and Readjusting Pythagoras

Bill James invented the Pythagorean expectation in the late 70's to predict a baseball team's winning percentage knowing just their runs scored and allowed. His original formula estimates a winning percentage of \({\rm RS}^2/({\rm RS}^2+{\rm RA}^2)\), where \({\rm RS}\) stands for runs sco...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Luo, Victor, Miller, Steven J
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 17.06.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Bill James invented the Pythagorean expectation in the late 70's to predict a baseball team's winning percentage knowing just their runs scored and allowed. His original formula estimates a winning percentage of \({\rm RS}^2/({\rm RS}^2+{\rm RA}^2)\), where \({\rm RS}\) stands for runs scored and \({\rm RA}\) for runs allowed; later versions found better agreement with data by replacing the exponent 2 with numbers near 1.83. Miller and his colleagues provided a theoretical justification by modeling runs scored and allowed by independent Weibull distributions. They showed that a single Weibull distribution did a very good job of describing runs scored and allowed, and led to a predicted won-loss percentage of \(({\rm RS_{\rm obs}}-1/2)^\gamma / (({\rm RS_{\rm obs}}-1/2)^\gamma + ({\rm RA_{\rm obs}}-1/2)^\gamma)\), where \({\rm RS_{\rm obs}}\) and \({\rm RA_{\rm obs}}\) are the observed runs scored and allowed and \(\gamma\) is the shape parameter of the Weibull (typically close to 1.8). We show a linear combination of Weibulls more accurately determines a team's run production and increases the prediction accuracy of a team's winning percentage by an average of about 25% (thus while the currently used variants of the original predictor are accurate to about four games a season, the new combination is accurate to about three). The new formula is more involved computationally; however, it can be easily computed on a laptop in a matter of minutes from publicly available season data. It performs as well (or slightly better) than the related Pythagorean formulas in use, and has the additional advantage of having a theoretical justification for its parameter values (and not just an optimization of parameters to minimize prediction error).
ISSN:2331-8422