Cross-Validation and the Estimation of Conditional Probability Densities

Many practical problems, especially some connected with forecasting, require nonparametric estimation of conditional densities from mixed data. For example, given an explanatory data vector X for a prospective customer, with components that could include the customer's salary, occupation, age,...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the American Statistical Association Vol. 99; no. 468; pp. 1015 - 1026
Main Authors	Hall, Peter, Racine, Jeff, Li, Qi
Format	Journal Article
Language	English
Published	Alexandria, VA Taylor & Francis 01.12.2004 American Statistical Association Taylor & Francis Ltd
Subjects	Algorithms Applications Bandwidth choice Binary data Categorical data Conditional probabilities Continuous data Data analysis Data smoothing Density estimation Dimension reduction Discrete data Estimation Estimation methods Estimators Exact sciences and technology Forecasts General topics Inference Kernel methods Mathematical functions Mathematical independent variables Mathematical models Mathematical vectors Mathematics Mixed data Multivariate analysis Nonparametric density estimation Nonparametric inference Parametric models Probability Probability and statistics Relevant and irrelevant data Sciences and techniques of general use Smoothing parameter choice Statistical analysis Statistical methods Statistics Theory and Methods Density estimation Conditional distribution Variance estimation Bias Shrinkage estimator Asymptotic optimality Non parametric estimation Probability distribution Cross validation Conditional probability Variance Probability density Uniform distribution Forecasting theory Marginal distribution Smoothing parameter Distribution function Smoothing methods Independence Conditional inference Statistical estimation Statistical method Smoothing Application Biased estimation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many practical problems, especially some connected with forecasting, require nonparametric estimation of conditional densities from mixed data. For example, given an explanatory data vector X for a prospective customer, with components that could include the customer's salary, occupation, age, sex, marital status, and address, a company might wish to estimate the density of the expenditure, Y, that could be made by that person, basing the inference on observations of (X, Y) for previous clients. Choosing appropriate smoothing parameters for this problem can be tricky, not in the least because plug-in rules take a particularly complex form in the case of mixed data. An obvious difficulty is that there exists no general formula for the optimal smoothing parameters. More insidiously, and more seriously, it can be difficult to determine which components of X are relevant to the problem of conditional inference. For example, if the jth component of X is independent of Y, then that component is irrelevant to estimating the density of Y given X, and ideally should be dropped before conducting inference. In this article we show that cross-validation overcomes these difficulties. It automatically determines which components are relevant and which are not, through assigning large smoothing parameters to the latter and consequently shrinking them toward the uniform distribution on the respective marginals. This effectively removes irrelevant components from contention, by suppressing their contribution to estimator variance; they already have very small bias, a consequence of their independence of Y. Cross-validation also yields important information about which components are relevant; the relevant components are precisely those that cross-validation has chosen to smooth in a traditional way, by assigning them smoothing parameters of conventional size. Indeed, cross-validation produces asymptotically optimal smoothing for relevant components, while eliminating irrelevant components by oversmoothing. In the problem of nonparametric estimation of a conditional density, cross-validation comes into its own as a method with no obvious peers.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0162-1459 1537-274X
DOI:	10.1198/016214504000000548