Unrepresentative big surveys significantly overestimated US vaccine uptake

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey...

Full description

Saved in:

Bibliographic Details
Published in	Nature (London) Vol. 600; no. 7890; pp. 695 - 700
Main Authors	Bradley, Valerie C., Kuriwaki, Shiro, Isakov, Michael, Sejdinovic, Dino, Meng, Xiao-Li, Flaxman, Seth
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 23.12.2021 Nature Publishing Group
Subjects	639/705/531 692/699/255 706/648/697 Benchmarking Benchmarks Bias Big Data Census Centers for Disease Control and Prevention, U.S Confidence intervals Control Coronaviruses COVID-19 - epidemiology COVID-19 - prevention & control COVID-19 vaccines COVID-19 Vaccines - administration & dosage Datasets as Topic - standards Disease control Epidemics Error analysis Estimates Female Forecasts and trends Health behavior Health Care Surveys - standards Humanities and Social Sciences Humans Male multidisciplinary Polls & surveys Public opinion Research Design Sample Size Science Science (multidisciplinary) Social Media Statistical analysis United Kingdom United States United States - epidemiology Vaccination - statistics & numerical data Vaccination Hesitancy - statistics & numerical data Vaccines United States United Kingdom United States > US
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox 1 . Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi–Facebook 2 , 3 (about 250,000 responses per week) and Census Household Pulse 4 (about 75,000 every two weeks). In May 2021, Delphi–Facebook overestimated uptake by 17 percentage points (14–20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11–17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios–Ipsos online panel 5 with about 1,000 responses per week following survey research best practices 6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework 1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition. An analysis of three surveys of COVID-19 vaccine behaviour shows that larger surveys overconfidently overestimated vaccine uptake, a demonstration of how larger sample sizes can paradoxically lead to less accurate estimates.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0028-0836 1476-4687
DOI:	10.1038/s41586-021-04198-4