Statistical Learning from a Regression Perspective

Statistical Learning from a Regression Perspective considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a firs...

Full description

Saved in:
Bibliographic Details
Main Author Berk, Dr. Richard A
Format eBook Book
LanguageEnglish
Published New York, NY Springer-Verlag 2008
Springer
Springer New York
Edition1. Aufl.
SeriesSpringer Series in Statistics
Subjects
Online AccessGet full text
ISBN9780387775005
0387775005
ISSN0172-7397
DOI10.1007/978-0-387-77501-2

Cover

Loading…
Table of Contents:
  • 3.2.4 CART as an Adaptive Nearest Neighbor Method -- 3.2.5 What CART Needs to Do -- 3.3 Splitting a Node -- 3.4 More on Classification -- 3.4.1 Fitted Values and Related Terms -- 3.4.2 An Example -- 3.5 Classification Errors and Costs -- 3.5.1 Default Costs in CART -- 3.5.2 Prior Probabilities and Costs -- 3.6 Pruning -- 3.6.1 Impurity Versus R α (T ) -- 3.7 Missing Data -- 3.7.1 Missing Data with CART -- 3.8 Statistical Inference with CART -- 3.9 Classification Versus Forecasting -- 3.10 Varying the Prior, Costs, and the Complexity Penalty -- 3.11 An Example with Three Response Categories -- 3.12 CART with Highly Skewed Response Distributions -- 3.13 Some Cautions in Interpreting CART Results -- 3.13.1 Model Bias -- 3.13.2 Model Variance -- 3.14 Regression Trees -- 3.14.1 An Illustration -- 3.14.2 Some Extensions -- 3.14.3 Multivariate Adaptive Regression Splines (MARS) -- 3.15 Software Issues -- 3.16 Summary and Conclusions -- 4 Bagging -- 4.1 Introduction -- 4.2 Overfitting and Cross-Validation -- 4.3 Bagging as an Algorithm -- 4.3.1 Margins -- 4.3.2 Out-Of-Bag Observations -- 4.4 Some Thinking on Why Bagging Works -- 4.4.1 More on Instability in CART -- 4.4.2 How Bagging Can Help -- 4.4.3 A Somewhat More Formal Explanation -- 4.5 Some Limitations of Bagging -- 4.5.1 Sometimes Bagging Does Not Help -- 4.5.2 Sometimes Bagging Can Make the Bias Worse -- 4.5.3 Sometimes Bagging Can Make the Variance Worse -- 4.5.4 Losing the Trees for the Forest -- 4.5.5 Bagging Is Only an Algorithm -- 4.6 An Example -- 4.7 Bagging a Quantitative Response Variable -- 4.8 Software Considerations -- 4.9 Summary and Conclusions -- 5 Random Forests -- 5.1 Introduction and Overview -- 5.1.1 Unpacking How Random Forests Works -- 5.2 An Initial Illustration -- 5.3 A Few Formalities -- 5.3.1 What Is a Random Forest?
  • Intro -- CONTENTS -- Preface -- 1 Statistical Learning as a Regression Problem -- 1.1 Getting Started -- 1.2 Setting the Regression Context -- 1.3 The Transition to Statistical Learning -- 1.3.1 Some Goals of Statistical Learning -- 1.3.2 Statistical Inference -- 1.3.3 Some Initial Cautions -- 1.3.4 A Cartoon Illustration -- 1.3.5 A Taste of Things to Come -- 1.4 Some Initial Concepts and Definitions -- 1.4.1 Overall Goals -- 1.4.2 Loss Functions and Related Concepts -- 1.4.3 Linear Estimators -- 1.4.4 Degrees of Freedom -- 1.4.5 Model Evaluation -- 1.4.6 Model Selection -- 1.4.7 Basis Functions -- 1.5 Some Common Themes -- 1.6 Summary and Conclusions -- 2 Regression Splines and Regression Smoothers -- 2.1 Introduction -- 2.2 Regression Splines -- 2.2.1 Applying a Piecewise Linear Basis -- 2.2.2 Polynomial Regression Splines -- 2.2.3 Natural Cubic Splines -- 2.2.4 B-Splines -- 2.3 Penalized Smoothing -- 2.3.1 Shrinkage -- 2.3.2 Shrinkage and Statistical Inference -- 2.3.3 Shrinkage: So What? -- 2.4 Smoothing Splines -- 2.4.1 An Illustration -- 2.5 Locally Weighted Regression as a Smoother -- 2.5.1 Nearest Neighbor Methods -- 2.5.2 Locally Weighted Regression -- 2.6 Smoothers for Multiple Predictors -- 2.6.1 Smoothing in Two Dimensions -- 2.6.2 The Generalized Additive Model -- 2.7 Smoothers with Categorical Variables -- 2.7.1 An Illustration -- 2.8 Locally Adaptive Smoothers -- 2.9 The Role of Statistical Inference -- 2.9.1 Some Apparent Prerequisites -- 2.9.2 Confidence Intervals -- 2.9.3 Statistical Tests -- 2.9.4 Can Asymptotics Help? -- 2.10 Software Issues -- 2.11 Summary and Conclusions -- 3 Classification and Regression Trees (CART) -- 3.1 Introduction -- 3.2 An Overview of Recursive Partitioning with CART -- 3.2.1 Tree Diagrams -- 3.2.2 Classification and Forecasting with CART -- 3.2.3 Confusion Tables
  • 5.3.2 Margins and Generalization Error for Classifiers in General -- 5.3.3 Generalization Error for Random Forests -- 5.3.4 The Strength of a Random Forest -- 5.3.5 Dependence -- 5.3.6 Implications -- 5.4 Random Forests and Adaptive Nearest Neighbor Methods -- 5.5 Taking Costs into Account in Random Forests -- 5.5.1 A Brief Illustration -- 5.6 Determining the Importance of the Predictors -- 5.6.1 Contributions to the Fit -- 5.6.2 Contributions to Forecasting Skill -- 5.7 Response Functions -- 5.7.1 An Example -- 5.8 The Proximity Matrix -- 5.8.1 Clustering by Proximity Values -- 5.8.2 Using Proximity Values to Impute Missing Data -- 5.8.3 Using Proximities to Detect Outliers -- 5.9 Quantitative Response Variables -- 5.10 Tuning Parameters -- 5.11 An Illustration Using a Binary Response Variable -- 5.12 An Illustration Using a Quantitative Response Variable -- 5.13 Software Considerations -- 5.14 Summary and Conclusions -- 5.14.1 Problem Set 1 -- 5.14.2 Problem Set 2 -- 5.14.3 Problem Set 3 -- 6 Boosting -- 6.1 Introduction -- 6.2 Adaboost -- 6.2.1 A Toy Numerical Example of Adaboost -- 6.2.2 A Statistical Perspective on Adaboost -- 6.3 Why Does Adaboost Work So Well? -- 6.3.1 Least Angle Regression (LARS) -- 6.4 Stochastic Gradient Boosting -- 6.4.1 Tuning Parameters -- 6.4.2 Output -- 6.5 Some Problems and Some Possible Solutions -- 6.5.1 Some Potential Problems -- 6.5.2 Some Potential Solutions -- 6.6 Some Examples -- 6.6.1 A Garden Variety Data Analysis -- 6.6.2 Inmate Misconduct Again -- 6.6.3 Homicides and the Impact of Executions -- 6.6.4 Imputing the Number of Homeless -- 6.6.5 Estimating Conditional Probabilities -- 6.7 Software Considerations -- 6.8 Summary and Conclusions -- 7 Support Vector Machines -- 7.1 A Simple Didactic Illustration -- 7.2 Support Vector Machines in Pictures -- 7.2.1 Support Vector Classifiers
  • 7.2.2 Support Vector Machines -- 7.3 Support Vector Machines in Statistical Notation -- 7.3.1 Support Vector Classifiers -- 7.3.2 Support Vector Machines -- 7.3.3 SVM for Regression -- 7.4 A Classification Example -- 7.4.1 SVM Analysis with a Linear Kernel -- 7.4.2 SVM Analysis with a Radial Kernel -- 7.4.3 Varying Tuning Parameters -- 7.4.4 Taking the Costs of Classification Errors into Account -- 7.4.5 Comparisons to Logistic Regression -- 7.5 Software Considerations -- 7.6 Summary and Conclusions -- 8 Broader Implications and a Bit of Craft Lore -- 8.1 Some Fundamental Limitations of Statistical Learning -- 8.2 Some Assets of Statistical Learning -- 8.2.1 The Attitude Adjustment -- 8.2.2 Selectively Better Performance -- 8.2.3 Improving Other Procedures -- 8.3 Some Practical Suggestions -- 8.3.1 Matching Tools to Jobs -- 8.3.2 Getting to Know Your Software -- 8.3.3 Not Forgetting the Basics -- 8.3.4 Getting Good Data -- 8.3.5 Being Sensitive to Overtuning -- 8.3.6 Matching Your Goals to What You Can Credibly Do -- 8.4 Some Concluding Observations -- References -- Index