Partially Supervised Learning Using an EM-Boosting Algorithm

Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be su...

Full description

Saved in:

Bibliographic Details
Published in	Biometrics Vol. 60; no. 1; pp. 199 - 206
Main Authors	Yasui, Yutaka, Pepe, Margaret, Hsu, Li, Adam, Bao-Ling, Feng, Ziding
Format	Journal Article
Language	English
Published	350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K Blackwell Publishing 01.03.2004 International Biometric Society
Subjects	Algorithms Artificial Intelligence Biomarkers, Tumor - blood Biometrics Biometry Biopsies Blood Proteins - analysis Datasets Epidemiology High-dimensional data Humans Learning disabilities Logistic regression Male Mass Spectrometry Misclassification Prostate cancer Prostatic hyperplasia Prostatic Hyperplasia - blood Prostatic Hyperplasia - diagnosis Prostatic Neoplasms - blood Prostatic Neoplasms - diagnosis Proteomics Test data Training
Online Access	Get full text
ISSN	0006-341X 1541-0420
DOI	10.1111/j.0006-341X.2004.00156.x

Cover

Abstract	Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.
AbstractList	Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost. Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass‐spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high‐dimensional predictor data, such as those generated in protein mass‐spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error‐prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM‐modified boosting (EM‐Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM‐Boost.
Author	Hsu, Li Adam, Bao-Ling Feng, Ziding Yasui, Yutaka Pepe, Margaret
Author_xml	– sequence: 1 givenname: Yutaka surname: Yasui fullname: Yasui, Yutaka email: yyasui@fhcrc.org organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A – sequence: 2 givenname: Margaret surname: Pepe fullname: Pepe, Margaret organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A – sequence: 3 givenname: Li surname: Hsu fullname: Hsu, Li organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A – sequence: 4 givenname: Bao-Ling surname: Adam fullname: Adam, Bao-Ling organization: Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta, Georgia 30912, U.S.A – sequence: 5 givenname: Ziding surname: Feng fullname: Feng, Ziding organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/15032790$$D View this record in MEDLINE/PubMed
BookMark	eNqNkEFvEzEQhS1URNPAP0BoT9w2Ha_t3bUESE1bSqWUIqCi4jJyvE5x2KyD7dDk3-NlSw5cwAd7RvO-Z807Iged6wwhGYUJTed4OQGAMmec3k4KAJ5aKsrJ9hEZUcFpDryAAzLaiw7JUQjL1EoBxRNySAWwopIwIq8-KB-tattd9mmzNv6nDabJZkb5znZ32U3ob9Vl51f51LkQ-_akvXPexm-rp-TxQrXBPHt4x-Tm7fnn03f57Pri8vRklmte8DIXi4opaWRTUDPXtWbQsIaDqEqARWkqDlrPaygML2tGlWQg6rrhWjZ12k0bNiYvB9-1dz82JkRc2aBN26rOuE3AilZMArAkfPEg3MxXpsG1tyvld_hn3yR4Mwi0dyF4s0Bto4rWddEr2yIF7APGJfbZYZ8d9gHj74Bxmwzqvwz2f_wbfT2g97Y1u__mcHp5fZWqxD8f-GWIzu95VkohUnBjkg9jG6LZ7sfKf8eyYpXAL-8vkN2eff3IpxLP2C9Cjak9
CitedBy_id	crossref_primary_10_1093_biostatistics_kxp052 crossref_primary_10_1093_bioinformatics_btt078 crossref_primary_10_1016_j_jmva_2010_03_001 crossref_primary_10_1016_j_patcog_2014_05_007 crossref_primary_10_1109_TNNLS_2020_3011671 crossref_primary_10_1214_15_AOAS812 crossref_primary_10_1002_0471250953_bi1301s10 crossref_primary_10_1002_pmic_200500192 crossref_primary_10_1021_pr200507b crossref_primary_10_1002_pmic_200700694
Cites_doi	10.1093/oxfordjournals.aje.a116805 10.1093/biostatistics/4.3.449 10.1093/biomet/82.2.315 10.1093/oxfordjournals.aje.a009251 10.1111/j.0006-341X.2002.00454.x 10.2307/2532670 10.1038/labinvest.3780122 10.1007/978-0-387-21606-5 10.1093/clinchem/48.10.1835 10.1214/aos/1016218223 10.2307/2531553 10.1155/S111072430320927X 10.1093/biomet/86.4.843 10.1093/oxfordjournals.aje.a112930 10.1111/j.1469-1809.1936.tb02137.x 10.1093/oxfordjournals.aje.a112408 10.1002/sim.4780080908 10.1111/j.2517-6161.1977.tb01600.x 10.1002/0471725293 10.1002/pros.1053 10.1093/oxfordjournals.aje.a114458 10.1111/1467-9868.00247 10.2307/2531595 10.1093/jnci/93.14.1054
ContentType	Journal Article
Copyright	Copyright 2004 The International Biometric Society
Copyright_xml	– notice: Copyright 2004 The International Biometric Society
DBID	BSCLL AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8
DOI	10.1111/j.0006-341X.2004.00156.x
DatabaseName	Istex CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic MEDLINE CrossRef
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Statistics Biology Mathematics
EISSN	1541-0420
EndPage	206
ExternalDocumentID	15032790 10_1111_j_0006_341X_2004_00156_x BIOM156 3695568 ark_67375_WNG_3XDZR4B9_D
Genre	article Research Support, U.S. Gov't, P.H.S Journal Article Comparative Study
GrantInformation_xml	– fundername: NCI NIH HHS grantid: U01-CA86368
GroupedDBID	--- -~X .3N .4S .DC .GA .GJ .Y3 05W 0R~ 10A 1OC 23N 2AX 2QV 3-9 31~ 33P 36B 3SF 4.4 44B 50Y 50Z 51W 51X 52M 52N 52O 52P 52S 52T 52U 52W 52X 53G 5GY 5HH 5LA 5RE 5VS 66C 6J9 702 7PT 7X7 8-0 8-1 8-3 8-4 8-5 88E 88I 8AF 8C1 8FE 8FG 8FH 8FI 8FJ 8R4 8R5 8UM 930 A03 A8Z AAESR AAEVG AAHBH AAMMB AANHP AANLZ AAONW AASGY AAUAY AAWIL AAXRX AAYCA AAZKR AAZSN ABAWQ ABBHK ABCQN ABCUV ABDBF ABDFA ABEJV ABEML ABFAN ABGNP ABJCF ABJNI ABLJU ABMNT ABPPZ ABPVW ABUWG ABXSQ ABXVV ABYWD ACAHQ ACBWZ ACCZN ACFBH ACGFO ACGFS ACGOD ACHJO ACIWK ACKIV ACMTB ACNCT ACPOU ACPRK ACRPL ACSCC ACTMH ACUHS ACXBN ACXQS ACYXJ ADBBV ADEOM ADIPN ADIZJ ADKYN ADMGS ADNBA ADNMO ADODI ADOZA ADULT ADVOB ADXAS ADZMN AEFGJ AEGXH AEIGN AEIMD AENEX AEOTA AEUPB AEUYR AFBPY AFDVO AFEBI AFGKR AFKRA AFVYC AFWVQ AFZJQ AGLNM AGORE AGQPQ AGTJU AGXDD AHGBF AHMBA AIAGR AIDQK AIDYY AIHAF AIURR AJAOE AJBYB AJNCP AJXKR ALAGY ALEEW ALMA_UNASSIGNED_HOLDINGS ALRMG ALUQN AMBMR AMYDB APXXL ARAPS ARCSS ASPBG AS~ ATUGU AUFTA AVWKF AZBYB AZFZN AZQEC AZVAB BAFTC BBNVY BCRHZ BDRZF BENPR BFHJK BGLVJ BHBCM BHPHI BMNLL BMXJE BNHUX BPHCQ BROTX BRXPI BSCLL BVXVI BY8 CAG CCPQU COF CS3 D-E D-F DCZOG DPXWK DQDLB DR2 DRFUL DRSTM DSRWC DWQXO DXH EAD EAP EBC EBD EBS ECEWR EDO EJD EMB EMK EMOBN EST ESTFP ESX F00 F01 F04 F5P FD6 FEDTE FXEWX FYUFA G-S G.N GNUQQ GODZA GS5 H.T H.X H13 HCIFZ HF~ HGD HMCUK HQ6 HVGLF HZI HZ~ IHE IPSME IX1 J0M JAAYA JAC JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JMS JPL JST K48 K6V K7- KOP L6V LATKE LC2 LC3 LEEKS LH4 LITHE LK8 LOXES LP6 LP7 LUTES LW6 LYRES M1P M2P M7P M7S MK4 MRFUL MRSTM MSFUL MSSTM MVM MXFUL MXSTM N04 N05 N9A NF~ NHB NU- O66 O9- OIG OJZSN OWPYF P0- P2P P2W P2X P4D P62 PHGZM PHGZT PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PTHSS PUEGO Q.N Q11 Q2X QB0 R.K RNS ROL ROX RWL RX1 RXW SA0 SUPJJ SV3 TAE TN5 TUS UAP UB1 UKHRP V8K W8V W99 WBKPD WH7 WIH WIK WOHZO WQJ WYISQ X6Y XBAML XG1 XSW ZGI ZXP ZY4 ZZTAW ~02 ~IA ~KM ~WT ALIPV AAHHS AAYXX ACCFJ ADZOD AEEZP AEQDE AIWBW AJBDE CITATION 3V. ABTAH AELPN AEUQT AFFTP AFPWT AIBGX CGR CUY CVF ECM EIF JSODD NPM PKN VQA WRC 7X8
ID	FETCH-LOGICAL-c4246-5f73a9e9d21ebc8c30d3d4057600f6e740ccb802e46831a930588d4c9d8004ce3
IEDL.DBID	DR2
ISSN	0006-341X
IngestDate	Mon Sep 08 15:21:00 EDT 2025 Wed Feb 19 01:36:15 EST 2025 Thu Apr 24 23:10:54 EDT 2025 Tue Jul 01 02:39:14 EDT 2025 Wed Aug 20 07:26:20 EDT 2025 Thu Jul 03 21:22:35 EDT 2025 Tue Sep 09 05:31:44 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	http://onlinelibrary.wiley.com/termsAndConditions#vor
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c4246-5f73a9e9d21ebc8c30d3d4057600f6e740ccb802e46831a930588d4c9d8004ce3
Notes	ArticleID:BIOM156 ark:/67375/WNG-3XDZR4B9-D istex:4DC669225539A7ABF8B4D769F740B5350D8E36C4 ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
PMID	15032790
PQID	71739003
PQPubID	23479
PageCount	8
ParticipantIDs	proquest_miscellaneous_71739003 pubmed_primary_15032790 crossref_citationtrail_10_1111_j_0006_341X_2004_00156_x crossref_primary_10_1111_j_0006_341X_2004_00156_x wiley_primary_10_1111_j_0006_341X_2004_00156_x_BIOM156 jstor_primary_3695568 istex_primary_ark_67375_WNG_3XDZR4B9_D
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	March 2004
PublicationDateYYYYMMDD	2004-03-01
PublicationDate_xml	– month: 03 year: 2004 text: March 2004
PublicationDecade	2000
PublicationPlace	350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K
PublicationPlace_xml	– name: 350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K – name: United States
PublicationTitle	Biometrics
PublicationTitleAlternate	Biometrics
PublicationYear	2004
Publisher	Blackwell Publishing International Biometric Society
Publisher_xml	– name: Blackwell Publishing – name: International Biometric Society
References	Chu, C. K. and Cheng, K. F. (1995). Nonparametric regression estimates using misclassified binary responses. Biometrika 82, 315-325. Magder, L. S. and Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 146, 195-203. White, E. (1986). The effect of misclassification of disease status in follow-up studies: Implications for selecting disease classification criteria. American Journal of Epidemiology 124, 816-825. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38. Brenner, H. and Gefeller, O. (1993). Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. American Journal of Epidemiology 138, 1007-1015. Qu, Y., Adam, B. L., Yasui, Y., Ward, M. D., Cazares, L. H., Schellhammer, P. F., Feng, Z., Semmes, O. J., and Wright, G. L., Jr. (2002). Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry 48, 1835-1843. Srivastava, S. and Kramer, B. S. (2000). Early detection cancer research network. Laboratory Investigation 80, 1147-1148. Chen, T. T. (1992). A review of methods for misclassified categorical data in epidemiology. Statistics in Medicine 8, 1095-1106. Ekholm, A. (1991). Algorithms versus models for analyzing data that contain misclassification errors. Biometrics 47, 1171-1182. Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazaras, L. H., Semmes, O. J., Schellhammer, P. F., Yasui, Y., Feng, Z., and Wright, G. L., Jr. (2002). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research 62, 3609-3614. Quade, D., Lachenbruch, P. A., Whaley, F. S., McClish, D. K., and Haley, R. W. (1980). Effects of misclassifications on statistical inferences in epidemiology. American Journal of Epidemiology 111, 503-515. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. New York : John Wiley. Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M. L., Thornquist, M., Winget, M.D., and Yasui, Y. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute 93, 1054-1061. Prescott, G. J. and Garthwaite, P. H. (2002). A simple Bayesian analysis of misclassified binary data with a validation substudy. Biometrics 58, 454-458. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179-188. Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika 86, 843-855. Yasui, Y., Pepe, M., Thompson, M. L., Adam, B. L., Wright, G. L., Jr., Qu, Y., Potter, J. D., Winget, M., Thornquist, M., and Feng, Z. (2003b). A data-analytic strategy for protein-biomarker discovery: Profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4, 449-463. Espeland, M. A. and Hui, S. L. (1987). A general approach to analyzing epidemiologic data that contain misclassification errors. Biometrics 43, 1001-1012. Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York : Springer. Wang, C. Y. and Pepe, M. S. (2000). Expected estimating equations to accommodate covariate measurement error. Journal of the Royal Statistical Society, Series B 62, 509-524. Yasui, Y., McLerran, D., Adam, B. L., Winget, M., Thonquist, M., and Feng, Z. (2003a). An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers. Journal of Biomedicine and Biotechnology 2003, 242-248. Djavan, B., Mazal, P., Zlotta, A., Wammack, R., Ravery, V., Remzi, M., Susani, M., Borkowski, A., Hruby, S., Boccon-Gibod, L., Schulman, C. C., and Marberger, M. (2001). Pathological features of prostate cancer detected in initial and repeat prostate biopsy: Results of the prospective European Prostate Cancer Detection study. Prostate 47, 111-117. Friedman, J. H., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 337-407. Copeland, K. T., Checkoway, H., McMichael, A. J., and Holbrook, R. H. (1977). Bias due to misclassification in the estimation of relative risk. American Journal of Epidemiology 105, 488-495. DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837-845. McLachlin, J. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. New York : John Wiley. 2001; 93 2002; 58 2000; 28 1999; 86 1977; 105 1992 1936; 7 2001; 47 1980; 111 1997; 146 1992; 8 2002; 48 2003a; 2003 1995; 82 2003b; 4 1987; 43 1986; 124 1991; 47 2001 1977; 39 2002; 62 1988; 44 2000; 62 2000; 80 1993; 138 1966 e_1_2_9_11_1 e_1_2_9_10_1 e_1_2_9_13_1 e_1_2_9_12_1 e_1_2_9_15_1 e_1_2_9_17_1 e_1_2_9_16_1 e_1_2_9_19_1 e_1_2_9_18_1 Qu Y. (e_1_2_9_21_1) 2002; 48 e_1_2_9_20_1 e_1_2_9_22_1 e_1_2_9_24_1 e_1_2_9_23_1 e_1_2_9_8_1 e_1_2_9_7_1 e_1_2_9_6_1 e_1_2_9_5_1 e_1_2_9_4_1 e_1_2_9_3_1 Green D. M. (e_1_2_9_14_1) 1966 e_1_2_9_9_1 e_1_2_9_26_1 e_1_2_9_25_1 Adam B. L. (e_1_2_9_2_1) 2002; 62 e_1_2_9_27_1
References_xml	– reference: Quade, D., Lachenbruch, P. A., Whaley, F. S., McClish, D. K., and Haley, R. W. (1980). Effects of misclassifications on statistical inferences in epidemiology. American Journal of Epidemiology 111, 503-515. – reference: Djavan, B., Mazal, P., Zlotta, A., Wammack, R., Ravery, V., Remzi, M., Susani, M., Borkowski, A., Hruby, S., Boccon-Gibod, L., Schulman, C. C., and Marberger, M. (2001). Pathological features of prostate cancer detected in initial and repeat prostate biopsy: Results of the prospective European Prostate Cancer Detection study. Prostate 47, 111-117. – reference: Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38. – reference: Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York : Springer. – reference: Espeland, M. A. and Hui, S. L. (1987). A general approach to analyzing epidemiologic data that contain misclassification errors. Biometrics 43, 1001-1012. – reference: Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika 86, 843-855. – reference: Chu, C. K. and Cheng, K. F. (1995). Nonparametric regression estimates using misclassified binary responses. Biometrika 82, 315-325. – reference: Prescott, G. J. and Garthwaite, P. H. (2002). A simple Bayesian analysis of misclassified binary data with a validation substudy. Biometrics 58, 454-458. – reference: Copeland, K. T., Checkoway, H., McMichael, A. J., and Holbrook, R. H. (1977). Bias due to misclassification in the estimation of relative risk. American Journal of Epidemiology 105, 488-495. – reference: Chen, T. T. (1992). A review of methods for misclassified categorical data in epidemiology. Statistics in Medicine 8, 1095-1106. – reference: Wang, C. Y. and Pepe, M. S. (2000). Expected estimating equations to accommodate covariate measurement error. Journal of the Royal Statistical Society, Series B 62, 509-524. – reference: Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M. L., Thornquist, M., Winget, M.D., and Yasui, Y. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute 93, 1054-1061. – reference: Friedman, J. H., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 337-407. – reference: Yasui, Y., McLerran, D., Adam, B. L., Winget, M., Thonquist, M., and Feng, Z. (2003a). An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers. Journal of Biomedicine and Biotechnology 2003, 242-248. – reference: Magder, L. S. and Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 146, 195-203. – reference: Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazaras, L. H., Semmes, O. J., Schellhammer, P. F., Yasui, Y., Feng, Z., and Wright, G. L., Jr. (2002). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research 62, 3609-3614. – reference: Brenner, H. and Gefeller, O. (1993). Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. American Journal of Epidemiology 138, 1007-1015. – reference: DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837-845. – reference: Yasui, Y., Pepe, M., Thompson, M. L., Adam, B. L., Wright, G. L., Jr., Qu, Y., Potter, J. D., Winget, M., Thornquist, M., and Feng, Z. (2003b). A data-analytic strategy for protein-biomarker discovery: Profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4, 449-463. – reference: McLachlin, J. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. New York : John Wiley. – reference: Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179-188. – reference: Qu, Y., Adam, B. L., Yasui, Y., Ward, M. D., Cazares, L. H., Schellhammer, P. F., Feng, Z., Semmes, O. J., and Wright, G. L., Jr. (2002). Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry 48, 1835-1843. – reference: Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. New York : John Wiley. – reference: Srivastava, S. and Kramer, B. S. (2000). Early detection cancer research network. Laboratory Investigation 80, 1147-1148. – reference: Ekholm, A. (1991). Algorithms versus models for analyzing data that contain misclassification errors. Biometrics 47, 1171-1182. – reference: White, E. (1986). The effect of misclassification of disease status in follow-up studies: Implications for selecting disease classification criteria. American Journal of Epidemiology 124, 816-825. – volume: 93, start-page: 1054 year: 2001 end-page: 1061 article-title: Phases of biomarker development for early detection of cancer publication-title: Journal of the National Cancer Institute – volume: 62, start-page: 3609 year: 2002 end-page: 3614 article-title: Serum protein fingerprinting coupled with a pattern‐matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men publication-title: Cancer Research – volume: 62, start-page: 509 year: 2000 end-page: 524 article-title: Expected estimating equations to accommodate covariate measurement error publication-title: Journal of the Royal Statistical Society, Series B – year: 1966 – volume: 47, start-page: 1171 year: 1991 end-page: 1182 article-title: Algorithms versus models for analyzing data that contain misclassification errors publication-title: Biometrics – year: 2001 – volume: 80, start-page: 1147 year: 2000 end-page: 1148 article-title: Early detection cancer research network publication-title: Laboratory Investigation – volume: 44, start-page: 837 year: 1988 end-page: 845 article-title: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach publication-title: Biometrics – volume: 7, start-page: 179 year: 1936 end-page: 188 article-title: The use of multiple measurements in taxonomic problems publication-title: Annals of Eugenics – volume: 82, start-page: 315 year: 1995 end-page: 325 article-title: Nonparametric regression estimates using misclassified binary responses publication-title: Biometrika – year: 1992 – volume: 86, start-page: 843 year: 1999 end-page: 855 article-title: Bias and efficiency loss due to misclassified responses in binary regression publication-title: Biometrika – volume: 138, start-page: 1007 year: 1993 end-page: 1015 article-title: Use of the positive predictive value to correct for disease misclassification in epidemiologic studies publication-title: American Journal of Epidemiology – volume: 47, start-page: 111 year: 2001 end-page: 117 article-title: Pathological features of prostate cancer detected in initial and repeat prostate biopsy: Results of the prospective European Prostate Cancer Detection study publication-title: Prostate – volume: 4, start-page: 449 year: 2003b end-page: 463 article-title: A data‐analytic strategy for protein‐biomarker discovery: Profiling of high‐dimensional proteomic data for cancer detection publication-title: Biostatistics – volume: 105, start-page: 488 year: 1977 end-page: 495 article-title: Bias due to misclassification in the estimation of relative risk publication-title: American Journal of Epidemiology – volume: 111, start-page: 503 year: 1980 end-page: 515 article-title: Effects of misclassifications on statistical inferences in epidemiology publication-title: American Journal of Epidemiology – volume: 48, start-page: 1835 year: 2002 end-page: 1843 article-title: Boosted decision tree analysis of surface‐enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients publication-title: Clinical Chemistry – volume: 39, start-page: 1 year: 1977 end-page: 38 article-title: Maximum likelihood from incomplete data via the EM algorithm publication-title: Journal of the Royal Statistical Society, Series B (Methodological) – volume: 8, start-page: 1095 year: 1992 end-page: 1106 article-title: A review of methods for misclassified categorical data in epidemiology publication-title: Statistics in Medicine – volume: 146, start-page: 195 year: 1997 end-page: 203 article-title: Logistic regression when the outcome is measured with uncertainty publication-title: American Journal of Epidemiology – volume: 28, start-page: 337 year: 2000 end-page: 407 article-title: Additive logistic regression: A statistical view of boosting publication-title: Annals of Statistics – volume: 43, start-page: 1001 year: 1987 end-page: 1012 article-title: A general approach to analyzing epidemiologic data that contain misclassification errors publication-title: Biometrics – volume: 58, start-page: 454 year: 2002 end-page: 458 article-title: A simple Bayesian analysis of misclassified binary data with a validation substudy publication-title: Biometrics – volume: 2003, start-page: 242 year: 2003a end-page: 248 article-title: An automated peak identification/calibration procedure for high‐dimensional protein measures from mass spectrometers publication-title: Journal of Biomedicine and Biotechnology – volume: 124, start-page: 816 year: 1986 end-page: 825 article-title: The effect of misclassification of disease status in follow‐up studies: Implications for selecting disease classification criteria publication-title: American Journal of Epidemiology – ident: e_1_2_9_3_1 doi: 10.1093/oxfordjournals.aje.a116805 – ident: e_1_2_9_27_1 doi: 10.1093/biostatistics/4.3.449 – ident: e_1_2_9_5_1 doi: 10.1093/biomet/82.2.315 – ident: e_1_2_9_16_1 doi: 10.1093/oxfordjournals.aje.a009251 – ident: e_1_2_9_20_1 doi: 10.1111/j.0006-341X.2002.00454.x – ident: e_1_2_9_10_1 doi: 10.2307/2532670 – volume-title: Signal Detection Theory and Psychophysics year: 1966 ident: e_1_2_9_14_1 – ident: e_1_2_9_23_1 doi: 10.1038/labinvest.3780122 – ident: e_1_2_9_15_1 doi: 10.1007/978-0-387-21606-5 – volume: 62 start-page: 3609 year: 2002 ident: e_1_2_9_2_1 article-title: Serum protein fingerprinting coupled with a pattern‐matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men publication-title: Cancer Research – volume: 48 start-page: 1835 year: 2002 ident: e_1_2_9_21_1 article-title: Boosted decision tree analysis of surface‐enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients publication-title: Clinical Chemistry doi: 10.1093/clinchem/48.10.1835 – ident: e_1_2_9_13_1 doi: 10.1214/aos/1016218223 – ident: e_1_2_9_11_1 doi: 10.2307/2531553 – ident: e_1_2_9_26_1 doi: 10.1155/S111072430320927X – ident: e_1_2_9_18_1 doi: 10.1093/biomet/86.4.843 – ident: e_1_2_9_22_1 doi: 10.1093/oxfordjournals.aje.a112930 – ident: e_1_2_9_12_1 doi: 10.1111/j.1469-1809.1936.tb02137.x – ident: e_1_2_9_6_1 doi: 10.1093/oxfordjournals.aje.a112408 – ident: e_1_2_9_4_1 doi: 10.1002/sim.4780080908 – ident: e_1_2_9_8_1 doi: 10.1111/j.2517-6161.1977.tb01600.x – ident: e_1_2_9_17_1 doi: 10.1002/0471725293 – ident: e_1_2_9_9_1 doi: 10.1002/pros.1053 – ident: e_1_2_9_25_1 doi: 10.1093/oxfordjournals.aje.a114458 – ident: e_1_2_9_24_1 doi: 10.1111/1467-9868.00247 – ident: e_1_2_9_7_1 doi: 10.2307/2531595 – ident: e_1_2_9_19_1 doi: 10.1093/jnci/93.14.1054
SSID	ssj0009502
Score	1.7831283
Snippet	Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective...
SourceID	proquest pubmed crossref wiley jstor istex
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	199
SubjectTerms	Algorithms Artificial Intelligence Biomarkers, Tumor - blood Biometrics Biometry Biopsies Blood Proteins - analysis Datasets Epidemiology High-dimensional data Humans Learning disabilities Logistic regression Male Mass Spectrometry Misclassification Prostate cancer Prostatic hyperplasia Prostatic Hyperplasia - blood Prostatic Hyperplasia - diagnosis Prostatic Neoplasms - blood Prostatic Neoplasms - diagnosis Proteomics Test data Training
Title	Partially Supervised Learning Using an EM-Boosting Algorithm
URI	https://api.istex.fr/ark:/67375/WNG-3XDZR4B9-D/fulltext.pdf https://www.jstor.org/stable/3695568 https://onlinelibrary.wiley.com/doi/abs/10.1111%2Fj.0006-341X.2004.00156.x https://www.ncbi.nlm.nih.gov/pubmed/15032790 https://www.proquest.com/docview/71739003
Volume	60
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Nb9QwEB2hIqRy4GMpJeUrB8QtK2_sOPEFqcu2FKQtqFCx4mI5tlPQbpNqsyu1nPgJ_EZ-CR4nm7Kohwoh5ZBDJrKdmfGz8_wG4AWjmc1UwiKiGYtYEYsoo4ZHbrLLiSl4Zgo8KDw-5AfH7N0kmbT8JzwL0-hDdBtuGBk-X2OAq7xeC3JMtZHLwhO_zOv7U8F9xJMDylFGf3QU_6G_Sxrh8NZkjdRz9YvWZqqbOOjnK9LiVXB0Hd366Wn_LkxXHWtYKdP-cpH39fe_NB__T8_vwZ0WxYa7jdvdhxu27MGtpq7lRQ9ujzsx2LoHmwhoGz3oB_DqA_qqms0uwo_LM0xUtTVhq_J6EnoGQ6jKcG_868fPYVXVSMsOd2cn1fzb4uvpFhzv7316fRC1NRwizWLGo6RIqRJWmHhgc51pSgw1CBId0Cq4TRnROs9IbBnP6EAJl36yzDAtjEOyTFv6EDbKqrSPICQK4anKRZILt8wq3IVV3ZkSSWw4yQJIV99L6lbgHOtszOTlQgcHTuLAYflNJv3AyfMABp3lWSPycQ2bl94lOgM1nyJJLk3k58M3kk5GX47YUMhRAFveZ7oHKRco-xbA85UPSRfS-J9GlbZa1hKJEbjBHMB241qXjUoIjVNBAuDeQa7dWjl8-37s7nb-1fAxbDacJWTfPYGNxXxpnzo4tsif-UD7DVWVIVg
linkProvider	Wiley-Blackwell
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwEB6hVohy4LEUCK_mgLhl5Y2dxL4gddmWLXQXVFqx6sVyYqeghqTah9Ry4ifwG_kleJxsyqIeKoSUQw6ZyHZmxp-dz98AvGSUG64iFpCMsYDloQg41XFgJ7uU6DzmOseDwqNxPDxi7ybRpCkHhGdhan2IdsMNI8Plawxw3JBeiXLMtYFNwxO3zuu6Y8FdCyjXmcUduBIbHIR_KPCSWjq8sVmh9Vz9ppW5ah2H_XxJW7wKkK7iWzdB7d6FYtm1mpdy2l3M0272_S_Vx__U93twpwGy_nbteffhhik7cLMubXnRgdujVg921oENxLS1JPQDeP0R3VUVxYX_aXGGuWpmtN8IvZ74jsTgq9LfGf368bNfVTNkZvvbxUk1_Tr_8m0TjnZ3Dt8Mg6aMQ5CxkMVBlCdUCSN02DNpxjNKNNWIEy3WymOTMJJlKSehYTGnPSVsBuJcs0xoC2ZZZuhDWCur0jwGnyhEqCoVUSrsSiu3FxZ2Z0pEoY4J9yBZfjCZNRrnWGqjkJdrHRw4iQOHFTiZdAMnzz3otZZntc7HNWxeOZ9oDdT0FHlySSQ_j99KOhkcH7C-kAMPNp3TtA_SWKDymwdbSyeSNqrxV40qTbWYSeRG4B6zB49q37psVERomAjiQew85Nqtlf29DyN79-RfDbfg1vBwtC_398bvn8JGTWFCMt4zWJtPF-a5RWfz9IWLut-JPCV3
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwEB6hVqBy4LEUCK_mgLhllY2dhy9IXbZLC-xSFSpWXCzHdtpql2S1D6nlxE_gN_JL8NjZlEU9VAgphxwykWPPjD87n78BeElJpjMR0yCUlAa0iFiQEZUEZrLLQ1UkmSrwoPBgmOwf03ejeFTzn_AsjNOHaDbcMDJsvsYAn6piLcgx1QYmC4_sMq9tTwW3DZ7cpIkBFgiQjqI_BHhDpxxe26yxeq5-09pUtYm9fr5iLV6FR9fhrZ2f-ndhvPoyR0sZt5eLvC2__yX6-H8-_R7cqWGsv-v87j7c0GULbrrClhctuD1o1GDnLdhCROsEoR_A60N0VjGZXPifllPMVHOt_Frm9cS3FAZflP7e4NePn92qmiMv29-dnFSzs8Xpt2047u99frMf1EUcAkkjmgRxkRLBNFNRR-cykyRURCFKNEirSHRKQynzLIw0TTLSEczknyxTVDJloCyVmjyEjbIq9WPwQ4H4VOQszplZZxXmwrLuVLA4UmbIPUhX48VlrXCOhTYm_HKlgx3HseOw_ibltuP4uQedxnLqVD6uYfPKukRjIGZjZMmlMf8yfMvJqPf1iHYZ73mwbX2meZAkDHXfPNhZ-RA3MY0_akSpq-WcIzMCd5g9eORc67JRcUiilIUeJNZBrt1a3j34ODB3T_7VcAduHfb6_MPB8P1T2HL8JWTiPYONxWypnxtotshf2Jj7Dd45JCY
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Partially+supervised+learning+using+an+EM-boosting+algorithm&rft.jtitle=Biometrics&rft.au=Yasui%2C+Yutaka&rft.au=Pepe%2C+Margaret&rft.au=Hsu%2C+Li&rft.au=Adam%2C+Bao-Ling&rft.date=2004-03-01&rft.issn=0006-341X&rft.volume=60&rft.issue=1&rft.spage=199&rft_id=info:doi/10.1111%2Fj.0006-341X.2004.00156.x&rft_id=info%3Apmid%2F15032790&rft.externalDocID=15032790
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0006-341X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0006-341X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0006-341X&client=summon