Optimal two‐phase sampling for estimating the area under the receiver operating characteristic curve
Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase...
Saved in:
Published in | Statistics in medicine Vol. 40; no. 4; pp. 1059 - 1071 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
England
20.02.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two‐phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two‐phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two‐phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two‐phase sampling under OA achieves a substantial amount of variance reduction with an over‐sample of subjects with low and high ordinal levels, compared with two‐phase sampling under proportional allocation (PA). Furthermore, in comparison with an one‐phase random sampling, two‐phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two‐phase sampling design to a real‐world example to evaluate the performance of a questionnaire score in screening for childhood asthma. |
---|---|
AbstractList | Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two-phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two-phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two-phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two-phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two-phase sampling under OA achieves a substantial amount of variance reduction with an over-sample of subjects with low and high ordinal levels, compared with two-phase sampling under proportional allocation (PA). Furthermore, in comparison with an one-phase random sampling, two-phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two-phase sampling design to a real-world example to evaluate the performance of a questionnaire score in screening for childhood asthma.Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two-phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two-phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two-phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two-phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two-phase sampling under OA achieves a substantial amount of variance reduction with an over-sample of subjects with low and high ordinal levels, compared with two-phase sampling under proportional allocation (PA). Furthermore, in comparison with an one-phase random sampling, two-phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two-phase sampling design to a real-world example to evaluate the performance of a questionnaire score in screening for childhood asthma. Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two‐phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two‐phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two‐phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two‐phase sampling under OA achieves a substantial amount of variance reduction with an over‐sample of subjects with low and high ordinal levels, compared with two‐phase sampling under proportional allocation (PA). Furthermore, in comparison with an one‐phase random sampling, two‐phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two‐phase sampling design to a real‐world example to evaluate the performance of a questionnaire score in screening for childhood asthma. |
Author | Wu, Yougui |
Author_xml | – sequence: 1 givenname: Yougui orcidid: 0000-0002-0401-7438 surname: Wu fullname: Wu, Yougui email: ywu@health.usf.edu organization: University of South Florida |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33210339$$D View this record in MEDLINE/PubMed |
BookMark | eNp1kMtKxDAUhoMoOqOCTyBduumYS9skSxm8DCgu1HVJz5w6kd5M2hF3PoLP6JOYcbyA6Cr88P0_J9-YbDZtg4QcMDphlPJjb-uJUkxvkBGjWsaUp2qTjCiXMs4kS3fI2PsHShlLudwmO0JwRoXQI1Jed72tTRX1T-3by2u3MB4jb-quss19VLYuQr8C-lXsFxgZhyYamjm6j-gQ0C5DaDt0awoWxhno0dnQhAgGt8Q9slWayuP-57tL7s5Ob6cX8eX1-Wx6chlDuEjHOim0hixJmOJynuqyYAIyVSgEXYCgWVYIVrKCgxYgBVfIpEpBCaVAy3kmdsnRerdz7eMQTs9r6wGryjTYDj7nScYTluhUB_TwEx2KGud558I33XP-5eZnC1zrvcPyG2E0X2nPg_Z8pT2gk18o2D7YaJveGVv9VYjXhSdb4fO_w_nN7OqDfwdNJJRg |
CitedBy_id | crossref_primary_10_1080_10543406_2024_2358803 crossref_primary_10_1097_PRS_0000000000010345 crossref_primary_10_1097_SLA_0000000000005386 |
Cites_doi | 10.1093/biostatistics/4.2.313 10.1002/sim.4780060402 10.2307/1403001 10.1002/sim.1318 10.2307/2533165 10.1002/sim.5946 10.2307/2531496 10.1093/oxfordjournals.aje.a117323 10.2307/2530820 10.1080/03610926.2018.1563176 10.1080/01621459.1970.10481170 10.1148/radiology.143.1.7063747 10.1183/09031936.99.14511909 10.1016/0022-2496(75)90001-2 |
ContentType | Journal Article |
Copyright | 2020 John Wiley & Sons Ltd 2020 John Wiley & Sons Ltd. |
Copyright_xml | – notice: 2020 John Wiley & Sons Ltd – notice: 2020 John Wiley & Sons Ltd. |
DBID | AAYXX CITATION NPM 7X8 |
DOI | 10.1002/sim.8819 |
DatabaseName | CrossRef PubMed MEDLINE - Academic |
DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic PubMed CrossRef |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Statistics Public Health |
EISSN | 1097-0258 |
EndPage | 1071 |
ExternalDocumentID | 33210339 10_1002_sim_8819 SIM8819 |
Genre | article Journal Article |
GroupedDBID | --- .3N .GA 05W 0R~ 10A 123 1L6 1OB 1OC 1ZS 33P 3SF 3WU 4.4 4ZD 50Y 50Z 51W 51X 52M 52N 52O 52P 52S 52T 52U 52W 52X 5RE 5VS 66C 6PF 702 7PT 8-0 8-1 8-3 8-4 8-5 8UM 930 A03 AAESR AAEVG AAHHS AAHQN AAMNL AANLZ AAONW AAWTL AAXRX AAYCA AAZKR ABCQN ABCUV ABIJN ABJNI ABOCM ABPVW ACAHQ ACCFJ ACCZN ACGFS ACPOU ACXBN ACXQS ADBBV ADEOM ADIZJ ADKYN ADMGS ADOZA ADXAS ADZMN AEEZP AEIGN AEIMD AENEX AEQDE AEUQT AEUYR AFBPY AFFPM AFGKR AFPWT AFWVQ AFZJQ AHBTC AHMBA AITYG AIURR AIWBW AJBDE AJXKR ALAGY ALMA_UNASSIGNED_HOLDINGS ALUQN ALVPJ AMBMR AMYDB ATUGU AUFTA AZBYB AZVAB BAFTC BFHJK BHBCM BMNLL BMXJE BNHUX BROTX BRXPI BY8 CS3 D-E D-F DCZOG DPXWK DR2 DRFUL DRSTM DU5 EBD EBS EMOBN F00 F01 F04 F5P G-S G.N GNP GODZA H.T H.X HBH HGLYW HHY HHZ HZ~ IX1 J0M JPC KQQ LATKE LAW LC2 LC3 LEEKS LH4 LITHE LOXES LP6 LP7 LUTES LYRES MEWTI MK4 MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM N04 N05 N9A NF~ NNB O66 O9- OIG P2P P2W P2X P4D PALCI PQQKQ Q.N Q11 QB0 QRW R.K ROL RWI RX1 RYL SUPJJ SV3 TN5 UB1 V2E W8V W99 WBKPD WH7 WIB WIH WIK WJL WOHZO WQJ WRC WUP WWH WXSBR WYISQ XBAML XG1 XV2 ZZTAW ~IA ~WT AAYXX AEYWJ AGHNM AGYGG AMVHM CITATION NPM 7X8 AAMMB AEFGJ AGXDD AIDQK AIDYY |
ID | FETCH-LOGICAL-c3219-94b99c6441827d59fb13c68b8ec9bc3066b31f1b2c93c7328e1785c8388c97d63 |
IEDL.DBID | DR2 |
ISSN | 0277-6715 1097-0258 |
IngestDate | Thu Jul 10 23:21:01 EDT 2025 Thu Apr 03 07:03:13 EDT 2025 Thu Apr 24 23:01:32 EDT 2025 Tue Jul 01 03:28:16 EDT 2025 Wed Jan 22 16:32:29 EST 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Keywords | optimal sampling probabilities relative efficiency two-phase sampling area under a ROC curve one-phase random sampling |
Language | English |
License | 2020 John Wiley & Sons Ltd. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c3219-94b99c6441827d59fb13c68b8ec9bc3066b31f1b2c93c7328e1785c8388c97d63 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0002-0401-7438 |
PMID | 33210339 |
PQID | 2462414959 |
PQPubID | 23479 |
PageCount | 13 |
ParticipantIDs | proquest_miscellaneous_2462414959 pubmed_primary_33210339 crossref_primary_10_1002_sim_8819 crossref_citationtrail_10_1002_sim_8819 wiley_primary_10_1002_sim_8819_SIM8819 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20 February 2021 2021-02-20 2021-Feb-20 20210220 |
PublicationDateYYYYMMDD | 2021-02-20 |
PublicationDate_xml | – month: 02 year: 2021 text: 20 February 2021 day: 20 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Statistics in medicine |
PublicationTitleAlternate | Stat Med |
PublicationYear | 2021 |
References | 1989; 45 2020 2002; 21 1987; 6 1999; 14 1994; 140 1975; 12 1970; 65 2003; 4 1996; 52 1982; 143 1977; 45 1983; 39 1996; 49 2014; 33 1977 Thompson SK (e_1_2_9_17_1) 1996 e_1_2_9_11_1 e_1_2_9_10_1 e_1_2_9_13_1 e_1_2_9_12_1 e_1_2_9_8_1 e_1_2_9_6_1 e_1_2_9_5_1 e_1_2_9_4_1 e_1_2_9_3_1 e_1_2_9_2_1 Cochran WG (e_1_2_9_7_1) 1977 e_1_2_9_9_1 e_1_2_9_15_1 e_1_2_9_14_1 e_1_2_9_16_1 |
References_xml | – volume: 14 start-page: 1190 year: 1999 end-page: 1997 article-title: Assessment of a simple scoring system applied to a screening questionnaire for asthma in children aged 5‐15 years publication-title: Eur Respir J – volume: 52 start-page: 299 year: 1996 end-page: 305 article-title: A nonparametric maximum likelihood estimator for the receiver operating characteristic curve area in the presence of verification bias publication-title: Biometrics – volume: 21 start-page: 3609 year: 2002 end-page: 3625 article-title: Optimal designs of two‐phase studies for estimation of sensitivity, specificity and positive predictive value publication-title: Stat Med – volume: 65 start-page: 1350 year: 1970 end-page: 1361 article-title: A double sampling scheme for estimating from binomial data with misclassifications publication-title: J Am Stat Assoc – volume: 49 year: 1996 – volume: 4 start-page: 313 year: 2003 end-page: 326 article-title: Estimating disease prevalence in two‐phase studies publication-title: Biostatistics – volume: 143 start-page: 29 year: 1982 end-page: 36 article-title: The meaning and use of the area under a receiver operating characteristic (ROC) curve publication-title: Radiology – volume: 33 start-page: 500 year: 2014 end-page: 513 article-title: Optimal two‐phase sampling design for comparing accuracies of two binary classification rules publication-title: Stat Med – volume: 45 start-page: 29 year: 1977 end-page: 37 article-title: An essay on screening, or on two‐phase sampling, applied to surveys of a community publication-title: Int Stat Rev – start-page: 1446 issue: 6 year: 2020 end-page: 1461 article-title: Optimal nonparametric estimator of the area under ROC curve based on clustered data publication-title: Commun Stat Theory Methods – year: 1977 – volume: 140 start-page: 759 year: 1994 end-page: 769 article-title: Efficient study designs assess the accuracy of screening tests publication-title: Am J Epidemiol – volume: 6 start-page: 411 year: 1987 end-page: 423 article-title: Biases in the assessment of diagnostic tests publication-title: Stat Med – volume: 39 start-page: 207 year: 1983 end-page: 216 article-title: Assessment of diagnostic tests when disease is subject to selection bias publication-title: Biometrics – volume: 45 start-page: 549 year: 1989 end-page: 555 article-title: Design of two‐phase prevalence surveys of rare disorders publication-title: Biometrics – volume: 12 start-page: 387 year: 1975 end-page: 415 article-title: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph publication-title: J Math Psychol – ident: e_1_2_9_4_1 doi: 10.1093/biostatistics/4.2.313 – ident: e_1_2_9_13_1 doi: 10.1002/sim.4780060402 – ident: e_1_2_9_3_1 doi: 10.2307/1403001 – volume-title: Sampling Techniques year: 1977 ident: e_1_2_9_7_1 – ident: e_1_2_9_9_1 doi: 10.1002/sim.1318 – ident: e_1_2_9_15_1 doi: 10.2307/2533165 – volume-title: Adaptive Sampling year: 1996 ident: e_1_2_9_17_1 – ident: e_1_2_9_10_1 doi: 10.1002/sim.5946 – ident: e_1_2_9_6_1 doi: 10.2307/2531496 – ident: e_1_2_9_8_1 doi: 10.1093/oxfordjournals.aje.a117323 – ident: e_1_2_9_14_1 doi: 10.2307/2530820 – ident: e_1_2_9_16_1 doi: 10.1080/03610926.2018.1563176 – ident: e_1_2_9_2_1 doi: 10.1080/01621459.1970.10481170 – ident: e_1_2_9_12_1 doi: 10.1148/radiology.143.1.7063747 – ident: e_1_2_9_5_1 doi: 10.1183/09031936.99.14511909 – ident: e_1_2_9_11_1 doi: 10.1016/0022-2496(75)90001-2 |
SSID | ssj0011527 |
Score | 2.3540845 |
Snippet | Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold... |
SourceID | proquest pubmed crossref wiley |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 1059 |
SubjectTerms | area under a ROC curve one‐phase random sampling optimal sampling probabilities relative efficiency two‐phase sampling |
Title | Optimal two‐phase sampling for estimating the area under the receiver operating characteristic curve |
URI | https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fsim.8819 https://www.ncbi.nlm.nih.gov/pubmed/33210339 https://www.proquest.com/docview/2462414959 |
Volume | 40 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1fa9swEBclD6Uw1i37l24tGoztyYktObb0WEZLWmgH2wKBPRjpLHdjjRPyp4U-9SP0M_aT9M6yM7J2UPZkhE_C1p1OP0l3PzH2AYRJU61tALkKg1i7JLBRkgeRcWgBToTGVlG-p8lgGB-P-qM6qpJyYTw_xGrDjUZG5a9pgBs77_0hDZ3_GqOWK8ZPCtUiPPR1xRwVNbe10gllkkb9hnc2FL2m4vpMdA9erqPVaro53GY_mg_1USa_u8uF7cLVXxyO__cnz9jTGoXyfW82z9mGK9ts86Q-Z2-zJ343j_skpTbbIkzqKZ1fsOILupkxVl9cTm6vb6Y_cSLkc0Oh6eUZRxDMibmDkDAWEV9yg8CUU7LarCqij3UUDcInU2J0JilYo43msJxduJdseHjw_fMgqK9rCECi3wt0bLUGwldKpHlfFzaSkCirHGgLuDRJrIyKyArQEogjyEWp6oOSSoFO80S-Yq1yUro3jKcxysS4NMuNixNjrNGFy0FZCAsBTnbYp0Z1GdRc5nSlxnnmWZhFhn2aUZ922PuV5NTzdzwk02g_w8FFJyamdJPlPBP4DTGtIVHmtTeLVSuSsp-kxDcfK-X-s_ns29EJPXceK_iWbQmKnKHE-fAday1mS7eL0Gdh9yojvwNGvAKH |
linkProvider | Wiley-Blackwell |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1taxQxEB5KC1qQWk_ti1YjiH7adjfZ203wk5SWq_YqaAv9ICzJbNaKdu-4lxb85E_wN_pLnNncnpwvIH5awk7CbmaSPJPMPAF4itLmuTEuwlLHUWp8FrkkK6PEerIAL2Prmijfk6x3lr46754vwYs2FybwQ8w33HhkNPM1D3DekN77yRo6_nhJambKzxW-0Lvxp97OuaOS9r5WPqPM8qTbMs_Gcq-tubgW_QYwF_Fqs-Ac3ob37aeGOJNPu9OJ28Uvv7A4_ue_rMPaDIiKl8Fy7sCSrztwoz87au_ArbChJ0KeUgdWGZYGVue7UL2hmeaSqk-uB9-_fhte0Fooxpaj0-sPgnCwYPIOBsNUJIgpLGFTwflqo6ZI06zngBAxGDKpM0vhAnO0wOnoyt-Ds8OD0_1eNLuxIUJFU19kUmcMMsTSMi-7pnKJwkw77dE4JO8kcyqpEifRKGSaIJ_kuotaaY0mLzN1H5brQe03QeQpyaTknZXWp5m1zprKl6gdxpVEr7bgeau7Amd05nyrxuciEDHLgvq04D7dgidzyWGg8PiTTKv-gsYXH5rY2g-m40LSN6TsRpLMRrCLeSuKE6CUojfPGu3-tfni3VGfn9v_KvgYbvZO-8fF8dHJ6wewKjmQhvPo44ewPBlN_Q4hoYl71Fj8D1hDBqI |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1baxUxEB6kQilI1aO29RpB9Gnb3WRPNnkU66FVW0UtFHxYktmsinbP4VwUfPIn-Bv9Jc5sdo8cLyA-LWEnYTczmXxJZr4A3EPpisJan2Bl0iS3QSc-01WSuUAWEGTqfBvle6wPTvInp8PTLqqSc2EiP8Ryw41HRuuveYBPqnrvJ2no7P0ZaZkZP8_nOjVs0fsvl9RRWX9dKx9R6iIb9sSzqdzra65ORb_hy1W42s43o4vwpv_SGGbyYXcx97v45RcSx__7lUuw2cFQ8TDazWU4F5oBrB91B-0DuBC380TMUhrABoPSyOl8Bern5GfOqPr88_j712-TdzQTipnj2PTmrSAULJi6g6EwFQlgCkfIVHC22rQtkpMNHA4ixhOmdGYpXOGNFriYfgpX4WT0-PWjg6S7ryFBRY4vsbm3FhlgGVlUQ1v7TKE23gS0Hmltor3K6sxLtAqZJChkhRmiUcagLSqtrsFaM27CNogiJ5mc1maVC7l2zjtbhwqNx7SWGNQOPOhVV2JHZs53anwsIw2zLKlPS-7THbi7lJxEAo8_yfTaL2l08ZGJa8J4MSslfUPOi0iS2YpmsWxFcfqTUvTmfqvcvzZfvjo84uf1fxW8A-sv9kfls8PjpzdgQ3IUDSfRpzdhbT5dhFsEg-b-dmvvPwBntgVa |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimal+two-phase+sampling+for+estimating+the+area+under+the+receiver+operating+characteristic+curve&rft.jtitle=Statistics+in+medicine&rft.au=Wu%2C+Yougui&rft.date=2021-02-20&rft.issn=1097-0258&rft.eissn=1097-0258&rft.volume=40&rft.issue=4&rft.spage=1059&rft_id=info:doi/10.1002%2Fsim.8819&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0277-6715&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0277-6715&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0277-6715&client=summon |