Optimal two‐phase sampling for estimating the area under the receiver operating characteristic curve

Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 40; no. 4; pp. 1059 - 1071
Main Author Wu, Yougui
Format Journal Article
LanguageEnglish
Published England 20.02.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two‐phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two‐phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two‐phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two‐phase sampling under OA achieves a substantial amount of variance reduction with an over‐sample of subjects with low and high ordinal levels, compared with two‐phase sampling under proportional allocation (PA). Furthermore, in comparison with an one‐phase random sampling, two‐phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two‐phase sampling design to a real‐world example to evaluate the performance of a questionnaire score in screening for childhood asthma.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0277-6715
1097-0258
1097-0258
DOI:10.1002/sim.8819