Optimal two‐phase sampling for estimating the area under the receiver operating characteristic curve

Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 40; no. 4; pp. 1059 - 1071
Main Author Wu, Yougui
Format Journal Article
LanguageEnglish
Published England 20.02.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two‐phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two‐phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two‐phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two‐phase sampling under OA achieves a substantial amount of variance reduction with an over‐sample of subjects with low and high ordinal levels, compared with two‐phase sampling under proportional allocation (PA). Furthermore, in comparison with an one‐phase random sampling, two‐phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two‐phase sampling design to a real‐world example to evaluate the performance of a questionnaire score in screening for childhood asthma.
AbstractList Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two-phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two-phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two-phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two-phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two-phase sampling under OA achieves a substantial amount of variance reduction with an over-sample of subjects with low and high ordinal levels, compared with two-phase sampling under proportional allocation (PA). Furthermore, in comparison with an one-phase random sampling, two-phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two-phase sampling design to a real-world example to evaluate the performance of a questionnaire score in screening for childhood asthma.Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two-phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two-phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two-phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two-phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two-phase sampling under OA achieves a substantial amount of variance reduction with an over-sample of subjects with low and high ordinal levels, compared with two-phase sampling under proportional allocation (PA). Furthermore, in comparison with an one-phase random sampling, two-phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two-phase sampling design to a real-world example to evaluate the performance of a questionnaire score in screening for childhood asthma.
Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two‐phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two‐phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two‐phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two‐phase sampling under OA achieves a substantial amount of variance reduction with an over‐sample of subjects with low and high ordinal levels, compared with two‐phase sampling under proportional allocation (PA). Furthermore, in comparison with an one‐phase random sampling, two‐phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two‐phase sampling design to a real‐world example to evaluate the performance of a questionnaire score in screening for childhood asthma.
Author Wu, Yougui
Author_xml – sequence: 1
  givenname: Yougui
  orcidid: 0000-0002-0401-7438
  surname: Wu
  fullname: Wu, Yougui
  email: ywu@health.usf.edu
  organization: University of South Florida
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33210339$$D View this record in MEDLINE/PubMed
BookMark eNp1kMtKxDAUhoMoOqOCTyBduumYS9skSxm8DCgu1HVJz5w6kd5M2hF3PoLP6JOYcbyA6Cr88P0_J9-YbDZtg4QcMDphlPJjb-uJUkxvkBGjWsaUp2qTjCiXMs4kS3fI2PsHShlLudwmO0JwRoXQI1Jed72tTRX1T-3by2u3MB4jb-quss19VLYuQr8C-lXsFxgZhyYamjm6j-gQ0C5DaDt0awoWxhno0dnQhAgGt8Q9slWayuP-57tL7s5Ob6cX8eX1-Wx6chlDuEjHOim0hixJmOJynuqyYAIyVSgEXYCgWVYIVrKCgxYgBVfIpEpBCaVAy3kmdsnRerdz7eMQTs9r6wGryjTYDj7nScYTluhUB_TwEx2KGud558I33XP-5eZnC1zrvcPyG2E0X2nPg_Z8pT2gk18o2D7YaJveGVv9VYjXhSdb4fO_w_nN7OqDfwdNJJRg
CitedBy_id crossref_primary_10_1080_10543406_2024_2358803
crossref_primary_10_1097_PRS_0000000000010345
crossref_primary_10_1097_SLA_0000000000005386
Cites_doi 10.1093/biostatistics/4.2.313
10.1002/sim.4780060402
10.2307/1403001
10.1002/sim.1318
10.2307/2533165
10.1002/sim.5946
10.2307/2531496
10.1093/oxfordjournals.aje.a117323
10.2307/2530820
10.1080/03610926.2018.1563176
10.1080/01621459.1970.10481170
10.1148/radiology.143.1.7063747
10.1183/09031936.99.14511909
10.1016/0022-2496(75)90001-2
ContentType Journal Article
Copyright 2020 John Wiley & Sons Ltd
2020 John Wiley & Sons Ltd.
Copyright_xml – notice: 2020 John Wiley & Sons Ltd
– notice: 2020 John Wiley & Sons Ltd.
DBID AAYXX
CITATION
NPM
7X8
DOI 10.1002/sim.8819
DatabaseName CrossRef
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

PubMed
CrossRef
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Statistics
Public Health
EISSN 1097-0258
EndPage 1071
ExternalDocumentID 33210339
10_1002_sim_8819
SIM8819
Genre article
Journal Article
GroupedDBID ---
.3N
.GA
05W
0R~
10A
123
1L6
1OB
1OC
1ZS
33P
3SF
3WU
4.4
4ZD
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
5RE
5VS
66C
6PF
702
7PT
8-0
8-1
8-3
8-4
8-5
8UM
930
A03
AAESR
AAEVG
AAHHS
AAHQN
AAMNL
AANLZ
AAONW
AAWTL
AAXRX
AAYCA
AAZKR
ABCQN
ABCUV
ABIJN
ABJNI
ABOCM
ABPVW
ACAHQ
ACCFJ
ACCZN
ACGFS
ACPOU
ACXBN
ACXQS
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADOZA
ADXAS
ADZMN
AEEZP
AEIGN
AEIMD
AENEX
AEQDE
AEUQT
AEUYR
AFBPY
AFFPM
AFGKR
AFPWT
AFWVQ
AFZJQ
AHBTC
AHMBA
AITYG
AIURR
AIWBW
AJBDE
AJXKR
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMBMR
AMYDB
ATUGU
AUFTA
AZBYB
AZVAB
BAFTC
BFHJK
BHBCM
BMNLL
BMXJE
BNHUX
BROTX
BRXPI
BY8
CS3
D-E
D-F
DCZOG
DPXWK
DR2
DRFUL
DRSTM
DU5
EBD
EBS
EMOBN
F00
F01
F04
F5P
G-S
G.N
GNP
GODZA
H.T
H.X
HBH
HGLYW
HHY
HHZ
HZ~
IX1
J0M
JPC
KQQ
LATKE
LAW
LC2
LC3
LEEKS
LH4
LITHE
LOXES
LP6
LP7
LUTES
LYRES
MEWTI
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
N9A
NF~
NNB
O66
O9-
OIG
P2P
P2W
P2X
P4D
PALCI
PQQKQ
Q.N
Q11
QB0
QRW
R.K
ROL
RWI
RX1
RYL
SUPJJ
SV3
TN5
UB1
V2E
W8V
W99
WBKPD
WH7
WIB
WIH
WIK
WJL
WOHZO
WQJ
WRC
WUP
WWH
WXSBR
WYISQ
XBAML
XG1
XV2
ZZTAW
~IA
~WT
AAYXX
AEYWJ
AGHNM
AGYGG
AMVHM
CITATION
NPM
7X8
AAMMB
AEFGJ
AGXDD
AIDQK
AIDYY
ID FETCH-LOGICAL-c3219-94b99c6441827d59fb13c68b8ec9bc3066b31f1b2c93c7328e1785c8388c97d63
IEDL.DBID DR2
ISSN 0277-6715
1097-0258
IngestDate Thu Jul 10 23:21:01 EDT 2025
Thu Apr 03 07:03:13 EDT 2025
Thu Apr 24 23:01:32 EDT 2025
Tue Jul 01 03:28:16 EDT 2025
Wed Jan 22 16:32:29 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords optimal sampling probabilities
relative efficiency
two-phase sampling
area under a ROC curve
one-phase random sampling
Language English
License 2020 John Wiley & Sons Ltd.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3219-94b99c6441827d59fb13c68b8ec9bc3066b31f1b2c93c7328e1785c8388c97d63
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-0401-7438
PMID 33210339
PQID 2462414959
PQPubID 23479
PageCount 13
ParticipantIDs proquest_miscellaneous_2462414959
pubmed_primary_33210339
crossref_primary_10_1002_sim_8819
crossref_citationtrail_10_1002_sim_8819
wiley_primary_10_1002_sim_8819_SIM8819
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20 February 2021
2021-02-20
2021-Feb-20
20210220
PublicationDateYYYYMMDD 2021-02-20
PublicationDate_xml – month: 02
  year: 2021
  text: 20 February 2021
  day: 20
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Statistics in medicine
PublicationTitleAlternate Stat Med
PublicationYear 2021
References 1989; 45
2020
2002; 21
1987; 6
1999; 14
1994; 140
1975; 12
1970; 65
2003; 4
1996; 52
1982; 143
1977; 45
1983; 39
1996; 49
2014; 33
1977
Thompson SK (e_1_2_9_17_1) 1996
e_1_2_9_11_1
e_1_2_9_10_1
e_1_2_9_13_1
e_1_2_9_12_1
e_1_2_9_8_1
e_1_2_9_6_1
e_1_2_9_5_1
e_1_2_9_4_1
e_1_2_9_3_1
e_1_2_9_2_1
Cochran WG (e_1_2_9_7_1) 1977
e_1_2_9_9_1
e_1_2_9_15_1
e_1_2_9_14_1
e_1_2_9_16_1
References_xml – volume: 14
  start-page: 1190
  year: 1999
  end-page: 1997
  article-title: Assessment of a simple scoring system applied to a screening questionnaire for asthma in children aged 5‐15 years
  publication-title: Eur Respir J
– volume: 52
  start-page: 299
  year: 1996
  end-page: 305
  article-title: A nonparametric maximum likelihood estimator for the receiver operating characteristic curve area in the presence of verification bias
  publication-title: Biometrics
– volume: 21
  start-page: 3609
  year: 2002
  end-page: 3625
  article-title: Optimal designs of two‐phase studies for estimation of sensitivity, specificity and positive predictive value
  publication-title: Stat Med
– volume: 65
  start-page: 1350
  year: 1970
  end-page: 1361
  article-title: A double sampling scheme for estimating from binomial data with misclassifications
  publication-title: J Am Stat Assoc
– volume: 49
  year: 1996
– volume: 4
  start-page: 313
  year: 2003
  end-page: 326
  article-title: Estimating disease prevalence in two‐phase studies
  publication-title: Biostatistics
– volume: 143
  start-page: 29
  year: 1982
  end-page: 36
  article-title: The meaning and use of the area under a receiver operating characteristic (ROC) curve
  publication-title: Radiology
– volume: 33
  start-page: 500
  year: 2014
  end-page: 513
  article-title: Optimal two‐phase sampling design for comparing accuracies of two binary classification rules
  publication-title: Stat Med
– volume: 45
  start-page: 29
  year: 1977
  end-page: 37
  article-title: An essay on screening, or on two‐phase sampling, applied to surveys of a community
  publication-title: Int Stat Rev
– start-page: 1446
  issue: 6
  year: 2020
  end-page: 1461
  article-title: Optimal nonparametric estimator of the area under ROC curve based on clustered data
  publication-title: Commun Stat Theory Methods
– year: 1977
– volume: 140
  start-page: 759
  year: 1994
  end-page: 769
  article-title: Efficient study designs assess the accuracy of screening tests
  publication-title: Am J Epidemiol
– volume: 6
  start-page: 411
  year: 1987
  end-page: 423
  article-title: Biases in the assessment of diagnostic tests
  publication-title: Stat Med
– volume: 39
  start-page: 207
  year: 1983
  end-page: 216
  article-title: Assessment of diagnostic tests when disease is subject to selection bias
  publication-title: Biometrics
– volume: 45
  start-page: 549
  year: 1989
  end-page: 555
  article-title: Design of two‐phase prevalence surveys of rare disorders
  publication-title: Biometrics
– volume: 12
  start-page: 387
  year: 1975
  end-page: 415
  article-title: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph
  publication-title: J Math Psychol
– ident: e_1_2_9_4_1
  doi: 10.1093/biostatistics/4.2.313
– ident: e_1_2_9_13_1
  doi: 10.1002/sim.4780060402
– ident: e_1_2_9_3_1
  doi: 10.2307/1403001
– volume-title: Sampling Techniques
  year: 1977
  ident: e_1_2_9_7_1
– ident: e_1_2_9_9_1
  doi: 10.1002/sim.1318
– ident: e_1_2_9_15_1
  doi: 10.2307/2533165
– volume-title: Adaptive Sampling
  year: 1996
  ident: e_1_2_9_17_1
– ident: e_1_2_9_10_1
  doi: 10.1002/sim.5946
– ident: e_1_2_9_6_1
  doi: 10.2307/2531496
– ident: e_1_2_9_8_1
  doi: 10.1093/oxfordjournals.aje.a117323
– ident: e_1_2_9_14_1
  doi: 10.2307/2530820
– ident: e_1_2_9_16_1
  doi: 10.1080/03610926.2018.1563176
– ident: e_1_2_9_2_1
  doi: 10.1080/01621459.1970.10481170
– ident: e_1_2_9_12_1
  doi: 10.1148/radiology.143.1.7063747
– ident: e_1_2_9_5_1
  doi: 10.1183/09031936.99.14511909
– ident: e_1_2_9_11_1
  doi: 10.1016/0022-2496(75)90001-2
SSID ssj0011527
Score 2.3540845
Snippet Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold...
SourceID proquest
pubmed
crossref
wiley
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1059
SubjectTerms area under a ROC curve
one‐phase random sampling
optimal sampling probabilities
relative efficiency
two‐phase sampling
Title Optimal two‐phase sampling for estimating the area under the receiver operating characteristic curve
URI https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fsim.8819
https://www.ncbi.nlm.nih.gov/pubmed/33210339
https://www.proquest.com/docview/2462414959
Volume 40
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1fa9swEBclD6Uw1i37l24tGoztyYktObb0WEZLWmgH2wKBPRjpLHdjjRPyp4U-9SP0M_aT9M6yM7J2UPZkhE_C1p1OP0l3PzH2AYRJU61tALkKg1i7JLBRkgeRcWgBToTGVlG-p8lgGB-P-qM6qpJyYTw_xGrDjUZG5a9pgBs77_0hDZ3_GqOWK8ZPCtUiPPR1xRwVNbe10gllkkb9hnc2FL2m4vpMdA9erqPVaro53GY_mg_1USa_u8uF7cLVXxyO__cnz9jTGoXyfW82z9mGK9ts86Q-Z2-zJ343j_skpTbbIkzqKZ1fsOILupkxVl9cTm6vb6Y_cSLkc0Oh6eUZRxDMibmDkDAWEV9yg8CUU7LarCqij3UUDcInU2J0JilYo43msJxduJdseHjw_fMgqK9rCECi3wt0bLUGwldKpHlfFzaSkCirHGgLuDRJrIyKyArQEogjyEWp6oOSSoFO80S-Yq1yUro3jKcxysS4NMuNixNjrNGFy0FZCAsBTnbYp0Z1GdRc5nSlxnnmWZhFhn2aUZ922PuV5NTzdzwk02g_w8FFJyamdJPlPBP4DTGtIVHmtTeLVSuSsp-kxDcfK-X-s_ns29EJPXceK_iWbQmKnKHE-fAday1mS7eL0Gdh9yojvwNGvAKH
linkProvider Wiley-Blackwell
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1taxQxEB5KC1qQWk_ti1YjiH7adjfZ203wk5SWq_YqaAv9ICzJbNaKdu-4lxb85E_wN_pLnNncnpwvIH5awk7CbmaSPJPMPAF4itLmuTEuwlLHUWp8FrkkK6PEerIAL2Prmijfk6x3lr46754vwYs2FybwQ8w33HhkNPM1D3DekN77yRo6_nhJambKzxW-0Lvxp97OuaOS9r5WPqPM8qTbMs_Gcq-tubgW_QYwF_Fqs-Ac3ob37aeGOJNPu9OJ28Uvv7A4_ue_rMPaDIiKl8Fy7sCSrztwoz87au_ArbChJ0KeUgdWGZYGVue7UL2hmeaSqk-uB9-_fhte0Fooxpaj0-sPgnCwYPIOBsNUJIgpLGFTwflqo6ZI06zngBAxGDKpM0vhAnO0wOnoyt-Ds8OD0_1eNLuxIUJFU19kUmcMMsTSMi-7pnKJwkw77dE4JO8kcyqpEifRKGSaIJ_kuotaaY0mLzN1H5brQe03QeQpyaTknZXWp5m1zprKl6gdxpVEr7bgeau7Amd05nyrxuciEDHLgvq04D7dgidzyWGg8PiTTKv-gsYXH5rY2g-m40LSN6TsRpLMRrCLeSuKE6CUojfPGu3-tfni3VGfn9v_KvgYbvZO-8fF8dHJ6wewKjmQhvPo44ewPBlN_Q4hoYl71Fj8D1hDBqI
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1baxUxEB6kQilI1aO29RpB9Gnb3WRPNnkU66FVW0UtFHxYktmsinbP4VwUfPIn-Bv9Jc5sdo8cLyA-LWEnYTczmXxJZr4A3EPpisJan2Bl0iS3QSc-01WSuUAWEGTqfBvle6wPTvInp8PTLqqSc2EiP8Ryw41HRuuveYBPqnrvJ2no7P0ZaZkZP8_nOjVs0fsvl9RRWX9dKx9R6iIb9sSzqdzra65ORb_hy1W42s43o4vwpv_SGGbyYXcx97v45RcSx__7lUuw2cFQ8TDazWU4F5oBrB91B-0DuBC380TMUhrABoPSyOl8Bern5GfOqPr88_j712-TdzQTipnj2PTmrSAULJi6g6EwFQlgCkfIVHC22rQtkpMNHA4ixhOmdGYpXOGNFriYfgpX4WT0-PWjg6S7ryFBRY4vsbm3FhlgGVlUQ1v7TKE23gS0Hmltor3K6sxLtAqZJChkhRmiUcagLSqtrsFaM27CNogiJ5mc1maVC7l2zjtbhwqNx7SWGNQOPOhVV2JHZs53anwsIw2zLKlPS-7THbi7lJxEAo8_yfTaL2l08ZGJa8J4MSslfUPOi0iS2YpmsWxFcfqTUvTmfqvcvzZfvjo84uf1fxW8A-sv9kfls8PjpzdgQ3IUDSfRpzdhbT5dhFsEg-b-dmvvPwBntgVa
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimal+two-phase+sampling+for+estimating+the+area+under+the+receiver+operating+characteristic+curve&rft.jtitle=Statistics+in+medicine&rft.au=Wu%2C+Yougui&rft.date=2021-02-20&rft.issn=1097-0258&rft.eissn=1097-0258&rft.volume=40&rft.issue=4&rft.spage=1059&rft_id=info:doi/10.1002%2Fsim.8819&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0277-6715&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0277-6715&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0277-6715&client=summon