Pooling stated and revealed preference data in the presence of RP endogeneity
•We identify cases where pooling RP and SP data can improve or worsen parameter recovery.•The likelihood ratio test can falsely reject pooling when RP data describes a small number of choice sets.•We propose a method for computing the information balance in pooled models.•We offer new insights for p...
Saved in:
Published in | Transportation research. Part B: methodological Vol. 109; pp. 70 - 89 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Oxford
Elsevier Ltd
01.03.2018
Elsevier Science Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •We identify cases where pooling RP and SP data can improve or worsen parameter recovery.•The likelihood ratio test can falsely reject pooling when RP data describes a small number of choice sets.•We propose a method for computing the information balance in pooled models.•We offer new insights for pooling SP and RP data under potential RP endogeneity.
Pooled discrete choice models combine revealed preference (RP) data and stated preference (SP) data to exploit advantages of each. SP data is often treated with suspicion because consumers may respond differently in a hypothetical survey context than they do in the marketplace. However, models built on RP data can suffer from endogeneity bias when attributes that drive consumer choices are unobserved by the modeler and correlated with observed variables. Using a synthetic data experiment, we test the performance of pooled RP–SP models in recovering the preference parameters that generated the market data under conditions that choice modelers are likely to face, including (1) when there is potential for endogeneity problems in the RP data, such as omitted variable bias, and (2) when consumer willingness to pay for attributes may differ from the survey context to the market context. We identify situations where pooling RP and SP data does and does not mitigate each data source’s respective weaknesses. We also show that the likelihood ratio test, which has been widely used to determine whether pooling is statistically justifiable, (1) can fail to identify the case where SP context preference differences and RP endogeneity bias shift the parameter estimates of both models in the same direction and magnitude and (2) is unreliable when the product attributes are fixed within a small number of choice sets, which is typical of automotive RP data. Our findings offer new insights into when pooling data sources may or may not be advisable for accurately estimating market preference parameters, including consideration of the conditions and context under which the data were generated as well as the relative balance of information between data sources. |
---|---|
ISSN: | 0191-2615 1879-2367 |
DOI: | 10.1016/j.trb.2018.01.010 |