Partially Supervised Learning Using an EM-Boosting Algorithm

Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be su...

Full description

Saved in:
Bibliographic Details
Published inBiometrics Vol. 60; no. 1; pp. 199 - 206
Main Authors Yasui, Yutaka, Pepe, Margaret, Hsu, Li, Adam, Bao-Ling, Feng, Ziding
Format Journal Article
LanguageEnglish
Published 350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K Blackwell Publishing 01.03.2004
International Biometric Society
Subjects
Online AccessGet full text
ISSN0006-341X
1541-0420
DOI10.1111/j.0006-341X.2004.00156.x

Cover

Abstract Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.
AbstractList Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.
Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass‐spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high‐dimensional predictor data, such as those generated in protein mass‐spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error‐prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM‐modified boosting (EM‐Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM‐Boost.
Author Hsu, Li
Adam, Bao-Ling
Feng, Ziding
Yasui, Yutaka
Pepe, Margaret
Author_xml – sequence: 1
  givenname: Yutaka
  surname: Yasui
  fullname: Yasui, Yutaka
  email: yyasui@fhcrc.org
  organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A
– sequence: 2
  givenname: Margaret
  surname: Pepe
  fullname: Pepe, Margaret
  organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A
– sequence: 3
  givenname: Li
  surname: Hsu
  fullname: Hsu, Li
  organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A
– sequence: 4
  givenname: Bao-Ling
  surname: Adam
  fullname: Adam, Bao-Ling
  organization: Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta, Georgia 30912, U.S.A
– sequence: 5
  givenname: Ziding
  surname: Feng
  fullname: Feng, Ziding
  organization: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, U.S.A
BackLink https://www.ncbi.nlm.nih.gov/pubmed/15032790$$D View this record in MEDLINE/PubMed
BookMark eNqNkEFvEzEQhS1URNPAP0BoT9w2Ha_t3bUESE1bSqWUIqCi4jJyvE5x2KyD7dDk3-NlSw5cwAd7RvO-Z807Iged6wwhGYUJTed4OQGAMmec3k4KAJ5aKsrJ9hEZUcFpDryAAzLaiw7JUQjL1EoBxRNySAWwopIwIq8-KB-tattd9mmzNv6nDabJZkb5znZ32U3ob9Vl51f51LkQ-_akvXPexm-rp-TxQrXBPHt4x-Tm7fnn03f57Pri8vRklmte8DIXi4opaWRTUDPXtWbQsIaDqEqARWkqDlrPaygML2tGlWQg6rrhWjZ12k0bNiYvB9-1dz82JkRc2aBN26rOuE3AilZMArAkfPEg3MxXpsG1tyvld_hn3yR4Mwi0dyF4s0Bto4rWddEr2yIF7APGJfbZYZ8d9gHj74Bxmwzqvwz2f_wbfT2g97Y1u__mcHp5fZWqxD8f-GWIzu95VkohUnBjkg9jG6LZ7sfKf8eyYpXAL-8vkN2eff3IpxLP2C9Cjak9
CitedBy_id crossref_primary_10_1093_biostatistics_kxp052
crossref_primary_10_1093_bioinformatics_btt078
crossref_primary_10_1016_j_jmva_2010_03_001
crossref_primary_10_1016_j_patcog_2014_05_007
crossref_primary_10_1109_TNNLS_2020_3011671
crossref_primary_10_1214_15_AOAS812
crossref_primary_10_1002_0471250953_bi1301s10
crossref_primary_10_1002_pmic_200500192
crossref_primary_10_1021_pr200507b
crossref_primary_10_1002_pmic_200700694
Cites_doi 10.1093/oxfordjournals.aje.a116805
10.1093/biostatistics/4.3.449
10.1093/biomet/82.2.315
10.1093/oxfordjournals.aje.a009251
10.1111/j.0006-341X.2002.00454.x
10.2307/2532670
10.1038/labinvest.3780122
10.1007/978-0-387-21606-5
10.1093/clinchem/48.10.1835
10.1214/aos/1016218223
10.2307/2531553
10.1155/S111072430320927X
10.1093/biomet/86.4.843
10.1093/oxfordjournals.aje.a112930
10.1111/j.1469-1809.1936.tb02137.x
10.1093/oxfordjournals.aje.a112408
10.1002/sim.4780080908
10.1111/j.2517-6161.1977.tb01600.x
10.1002/0471725293
10.1002/pros.1053
10.1093/oxfordjournals.aje.a114458
10.1111/1467-9868.00247
10.2307/2531595
10.1093/jnci/93.14.1054
ContentType Journal Article
Copyright Copyright 2004 The International Biometric Society
Copyright_xml – notice: Copyright 2004 The International Biometric Society
DBID BSCLL
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1111/j.0006-341X.2004.00156.x
DatabaseName Istex
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic


MEDLINE
CrossRef
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Biology
Mathematics
EISSN 1541-0420
EndPage 206
ExternalDocumentID 15032790
10_1111_j_0006_341X_2004_00156_x
BIOM156
3695568
ark_67375_WNG_3XDZR4B9_D
Genre article
Research Support, U.S. Gov't, P.H.S
Journal Article
Comparative Study
GrantInformation_xml – fundername: NCI NIH HHS
  grantid: U01-CA86368
GroupedDBID ---
-~X
.3N
.4S
.DC
.GA
.GJ
.Y3
05W
0R~
10A
1OC
23N
2AX
2QV
3-9
31~
33P
36B
3SF
4.4
44B
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
53G
5GY
5HH
5LA
5RE
5VS
66C
6J9
702
7PT
7X7
8-0
8-1
8-3
8-4
8-5
88E
88I
8AF
8C1
8FE
8FG
8FH
8FI
8FJ
8R4
8R5
8UM
930
A03
A8Z
AAESR
AAEVG
AAHBH
AAMMB
AANHP
AANLZ
AAONW
AASGY
AAUAY
AAWIL
AAXRX
AAYCA
AAZKR
AAZSN
ABAWQ
ABBHK
ABCQN
ABCUV
ABDBF
ABDFA
ABEJV
ABEML
ABFAN
ABGNP
ABJCF
ABJNI
ABLJU
ABMNT
ABPPZ
ABPVW
ABUWG
ABXSQ
ABXVV
ABYWD
ACAHQ
ACBWZ
ACCZN
ACFBH
ACGFO
ACGFS
ACGOD
ACHJO
ACIWK
ACKIV
ACMTB
ACNCT
ACPOU
ACPRK
ACRPL
ACSCC
ACTMH
ACUHS
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIPN
ADIZJ
ADKYN
ADMGS
ADNBA
ADNMO
ADODI
ADOZA
ADULT
ADVOB
ADXAS
ADZMN
AEFGJ
AEGXH
AEIGN
AEIMD
AENEX
AEOTA
AEUPB
AEUYR
AFBPY
AFDVO
AFEBI
AFGKR
AFKRA
AFVYC
AFWVQ
AFZJQ
AGLNM
AGORE
AGQPQ
AGTJU
AGXDD
AHGBF
AHMBA
AIAGR
AIDQK
AIDYY
AIHAF
AIURR
AJAOE
AJBYB
AJNCP
AJXKR
ALAGY
ALEEW
ALMA_UNASSIGNED_HOLDINGS
ALRMG
ALUQN
AMBMR
AMYDB
APXXL
ARAPS
ARCSS
ASPBG
AS~
ATUGU
AUFTA
AVWKF
AZBYB
AZFZN
AZQEC
AZVAB
BAFTC
BBNVY
BCRHZ
BDRZF
BENPR
BFHJK
BGLVJ
BHBCM
BHPHI
BMNLL
BMXJE
BNHUX
BPHCQ
BROTX
BRXPI
BSCLL
BVXVI
BY8
CAG
CCPQU
COF
CS3
D-E
D-F
DCZOG
DPXWK
DQDLB
DR2
DRFUL
DRSTM
DSRWC
DWQXO
DXH
EAD
EAP
EBC
EBD
EBS
ECEWR
EDO
EJD
EMB
EMK
EMOBN
EST
ESTFP
ESX
F00
F01
F04
F5P
FD6
FEDTE
FXEWX
FYUFA
G-S
G.N
GNUQQ
GODZA
GS5
H.T
H.X
H13
HCIFZ
HF~
HGD
HMCUK
HQ6
HVGLF
HZI
HZ~
IHE
IPSME
IX1
J0M
JAAYA
JAC
JBMMH
JBZCM
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JMS
JPL
JST
K48
K6V
K7-
KOP
L6V
LATKE
LC2
LC3
LEEKS
LH4
LITHE
LK8
LOXES
LP6
LP7
LUTES
LW6
LYRES
M1P
M2P
M7P
M7S
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MVM
MXFUL
MXSTM
N04
N05
N9A
NF~
NHB
NU-
O66
O9-
OIG
OJZSN
OWPYF
P0-
P2P
P2W
P2X
P4D
P62
PHGZM
PHGZT
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PTHSS
PUEGO
Q.N
Q11
Q2X
QB0
R.K
RNS
ROL
ROX
RWL
RX1
RXW
SA0
SUPJJ
SV3
TAE
TN5
TUS
UAP
UB1
UKHRP
V8K
W8V
W99
WBKPD
WH7
WIH
WIK
WOHZO
WQJ
WYISQ
X6Y
XBAML
XG1
XSW
ZGI
ZXP
ZY4
ZZTAW
~02
~IA
~KM
~WT
ALIPV
AAHHS
AAYXX
ACCFJ
ADZOD
AEEZP
AEQDE
AIWBW
AJBDE
CITATION
3V.
ABTAH
AELPN
AEUQT
AFFTP
AFPWT
AIBGX
CGR
CUY
CVF
ECM
EIF
JSODD
NPM
PKN
VQA
WRC
7X8
ID FETCH-LOGICAL-c4246-5f73a9e9d21ebc8c30d3d4057600f6e740ccb802e46831a930588d4c9d8004ce3
IEDL.DBID DR2
ISSN 0006-341X
IngestDate Mon Sep 08 15:21:00 EDT 2025
Wed Feb 19 01:36:15 EST 2025
Thu Apr 24 23:10:54 EDT 2025
Tue Jul 01 02:39:14 EDT 2025
Wed Aug 20 07:26:20 EDT 2025
Thu Jul 03 21:22:35 EDT 2025
Tue Sep 09 05:31:44 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License http://onlinelibrary.wiley.com/termsAndConditions#vor
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4246-5f73a9e9d21ebc8c30d3d4057600f6e740ccb802e46831a930588d4c9d8004ce3
Notes ArticleID:BIOM156
ark:/67375/WNG-3XDZR4B9-D
istex:4DC669225539A7ABF8B4D769F740B5350D8E36C4
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PMID 15032790
PQID 71739003
PQPubID 23479
PageCount 8
ParticipantIDs proquest_miscellaneous_71739003
pubmed_primary_15032790
crossref_citationtrail_10_1111_j_0006_341X_2004_00156_x
crossref_primary_10_1111_j_0006_341X_2004_00156_x
wiley_primary_10_1111_j_0006_341X_2004_00156_x_BIOM156
jstor_primary_3695568
istex_primary_ark_67375_WNG_3XDZR4B9_D
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate March 2004
PublicationDateYYYYMMDD 2004-03-01
PublicationDate_xml – month: 03
  year: 2004
  text: March 2004
PublicationDecade 2000
PublicationPlace 350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K
PublicationPlace_xml – name: 350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K
– name: United States
PublicationTitle Biometrics
PublicationTitleAlternate Biometrics
PublicationYear 2004
Publisher Blackwell Publishing
International Biometric Society
Publisher_xml – name: Blackwell Publishing
– name: International Biometric Society
References Chu, C. K. and Cheng, K. F. (1995). Nonparametric regression estimates using misclassified binary responses. Biometrika 82, 315-325.
Magder, L. S. and Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 146, 195-203.
White, E. (1986). The effect of misclassification of disease status in follow-up studies: Implications for selecting disease classification criteria. American Journal of Epidemiology 124, 816-825.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38.
Brenner, H. and Gefeller, O. (1993). Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. American Journal of Epidemiology 138, 1007-1015.
Qu, Y., Adam, B. L., Yasui, Y., Ward, M. D., Cazares, L. H., Schellhammer, P. F., Feng, Z., Semmes, O. J., and Wright, G. L., Jr. (2002). Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry 48, 1835-1843.
Srivastava, S. and Kramer, B. S. (2000). Early detection cancer research network. Laboratory Investigation 80, 1147-1148.
Chen, T. T. (1992). A review of methods for misclassified categorical data in epidemiology. Statistics in Medicine 8, 1095-1106.
Ekholm, A. (1991). Algorithms versus models for analyzing data that contain misclassification errors. Biometrics 47, 1171-1182.
Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazaras, L. H., Semmes, O. J., Schellhammer, P. F., Yasui, Y., Feng, Z., and Wright, G. L., Jr. (2002). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research 62, 3609-3614.
Quade, D., Lachenbruch, P. A., Whaley, F. S., McClish, D. K., and Haley, R. W. (1980). Effects of misclassifications on statistical inferences in epidemiology. American Journal of Epidemiology 111, 503-515.
Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. New York : John Wiley.
Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M. L., Thornquist, M., Winget, M.D., and Yasui, Y. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute 93, 1054-1061.
Prescott, G. J. and Garthwaite, P. H. (2002). A simple Bayesian analysis of misclassified binary data with a validation substudy. Biometrics 58, 454-458.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179-188.
Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika 86, 843-855.
Yasui, Y., Pepe, M., Thompson, M. L., Adam, B. L., Wright, G. L., Jr., Qu, Y., Potter, J. D., Winget, M., Thornquist, M., and Feng, Z. (2003b). A data-analytic strategy for protein-biomarker discovery: Profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4, 449-463.
Espeland, M. A. and Hui, S. L. (1987). A general approach to analyzing epidemiologic data that contain misclassification errors. Biometrics 43, 1001-1012.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York : Springer.
Wang, C. Y. and Pepe, M. S. (2000). Expected estimating equations to accommodate covariate measurement error. Journal of the Royal Statistical Society, Series B 62, 509-524.
Yasui, Y., McLerran, D., Adam, B. L., Winget, M., Thonquist, M., and Feng, Z. (2003a). An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers. Journal of Biomedicine and Biotechnology 2003, 242-248.
Djavan, B., Mazal, P., Zlotta, A., Wammack, R., Ravery, V., Remzi, M., Susani, M., Borkowski, A., Hruby, S., Boccon-Gibod, L., Schulman, C. C., and Marberger, M. (2001). Pathological features of prostate cancer detected in initial and repeat prostate biopsy: Results of the prospective European Prostate Cancer Detection study. Prostate 47, 111-117.
Friedman, J. H., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 337-407.
Copeland, K. T., Checkoway, H., McMichael, A. J., and Holbrook, R. H. (1977). Bias due to misclassification in the estimation of relative risk. American Journal of Epidemiology 105, 488-495.
DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837-845.
McLachlin, J. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. New York : John Wiley.
2001; 93
2002; 58
2000; 28
1999; 86
1977; 105
1992
1936; 7
2001; 47
1980; 111
1997; 146
1992; 8
2002; 48
2003a; 2003
1995; 82
2003b; 4
1987; 43
1986; 124
1991; 47
2001
1977; 39
2002; 62
1988; 44
2000; 62
2000; 80
1993; 138
1966
e_1_2_9_11_1
e_1_2_9_10_1
e_1_2_9_13_1
e_1_2_9_12_1
e_1_2_9_15_1
e_1_2_9_17_1
e_1_2_9_16_1
e_1_2_9_19_1
e_1_2_9_18_1
Qu Y. (e_1_2_9_21_1) 2002; 48
e_1_2_9_20_1
e_1_2_9_22_1
e_1_2_9_24_1
e_1_2_9_23_1
e_1_2_9_8_1
e_1_2_9_7_1
e_1_2_9_6_1
e_1_2_9_5_1
e_1_2_9_4_1
e_1_2_9_3_1
Green D. M. (e_1_2_9_14_1) 1966
e_1_2_9_9_1
e_1_2_9_26_1
e_1_2_9_25_1
Adam B. L. (e_1_2_9_2_1) 2002; 62
e_1_2_9_27_1
References_xml – reference: Quade, D., Lachenbruch, P. A., Whaley, F. S., McClish, D. K., and Haley, R. W. (1980). Effects of misclassifications on statistical inferences in epidemiology. American Journal of Epidemiology 111, 503-515.
– reference: Djavan, B., Mazal, P., Zlotta, A., Wammack, R., Ravery, V., Remzi, M., Susani, M., Borkowski, A., Hruby, S., Boccon-Gibod, L., Schulman, C. C., and Marberger, M. (2001). Pathological features of prostate cancer detected in initial and repeat prostate biopsy: Results of the prospective European Prostate Cancer Detection study. Prostate 47, 111-117.
– reference: Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38.
– reference: Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York : Springer.
– reference: Espeland, M. A. and Hui, S. L. (1987). A general approach to analyzing epidemiologic data that contain misclassification errors. Biometrics 43, 1001-1012.
– reference: Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika 86, 843-855.
– reference: Chu, C. K. and Cheng, K. F. (1995). Nonparametric regression estimates using misclassified binary responses. Biometrika 82, 315-325.
– reference: Prescott, G. J. and Garthwaite, P. H. (2002). A simple Bayesian analysis of misclassified binary data with a validation substudy. Biometrics 58, 454-458.
– reference: Copeland, K. T., Checkoway, H., McMichael, A. J., and Holbrook, R. H. (1977). Bias due to misclassification in the estimation of relative risk. American Journal of Epidemiology 105, 488-495.
– reference: Chen, T. T. (1992). A review of methods for misclassified categorical data in epidemiology. Statistics in Medicine 8, 1095-1106.
– reference: Wang, C. Y. and Pepe, M. S. (2000). Expected estimating equations to accommodate covariate measurement error. Journal of the Royal Statistical Society, Series B 62, 509-524.
– reference: Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M. L., Thornquist, M., Winget, M.D., and Yasui, Y. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute 93, 1054-1061.
– reference: Friedman, J. H., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 337-407.
– reference: Yasui, Y., McLerran, D., Adam, B. L., Winget, M., Thonquist, M., and Feng, Z. (2003a). An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers. Journal of Biomedicine and Biotechnology 2003, 242-248.
– reference: Magder, L. S. and Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 146, 195-203.
– reference: Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazaras, L. H., Semmes, O. J., Schellhammer, P. F., Yasui, Y., Feng, Z., and Wright, G. L., Jr. (2002). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research 62, 3609-3614.
– reference: Brenner, H. and Gefeller, O. (1993). Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. American Journal of Epidemiology 138, 1007-1015.
– reference: DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837-845.
– reference: Yasui, Y., Pepe, M., Thompson, M. L., Adam, B. L., Wright, G. L., Jr., Qu, Y., Potter, J. D., Winget, M., Thornquist, M., and Feng, Z. (2003b). A data-analytic strategy for protein-biomarker discovery: Profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4, 449-463.
– reference: McLachlin, J. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. New York : John Wiley.
– reference: Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179-188.
– reference: Qu, Y., Adam, B. L., Yasui, Y., Ward, M. D., Cazares, L. H., Schellhammer, P. F., Feng, Z., Semmes, O. J., and Wright, G. L., Jr. (2002). Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry 48, 1835-1843.
– reference: Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. New York : John Wiley.
– reference: Srivastava, S. and Kramer, B. S. (2000). Early detection cancer research network. Laboratory Investigation 80, 1147-1148.
– reference: Ekholm, A. (1991). Algorithms versus models for analyzing data that contain misclassification errors. Biometrics 47, 1171-1182.
– reference: White, E. (1986). The effect of misclassification of disease status in follow-up studies: Implications for selecting disease classification criteria. American Journal of Epidemiology 124, 816-825.
– volume: 93,
  start-page: 1054
  year: 2001
  end-page: 1061
  article-title: Phases of biomarker development for early detection of cancer
  publication-title: Journal of the National Cancer Institute
– volume: 62,
  start-page: 3609
  year: 2002
  end-page: 3614
  article-title: Serum protein fingerprinting coupled with a pattern‐matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men
  publication-title: Cancer Research
– volume: 62,
  start-page: 509
  year: 2000
  end-page: 524
  article-title: Expected estimating equations to accommodate covariate measurement error
  publication-title: Journal of the Royal Statistical Society, Series B
– year: 1966
– volume: 47,
  start-page: 1171
  year: 1991
  end-page: 1182
  article-title: Algorithms versus models for analyzing data that contain misclassification errors
  publication-title: Biometrics
– year: 2001
– volume: 80,
  start-page: 1147
  year: 2000
  end-page: 1148
  article-title: Early detection cancer research network
  publication-title: Laboratory Investigation
– volume: 44,
  start-page: 837
  year: 1988
  end-page: 845
  article-title: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach
  publication-title: Biometrics
– volume: 7,
  start-page: 179
  year: 1936
  end-page: 188
  article-title: The use of multiple measurements in taxonomic problems
  publication-title: Annals of Eugenics
– volume: 82,
  start-page: 315
  year: 1995
  end-page: 325
  article-title: Nonparametric regression estimates using misclassified binary responses
  publication-title: Biometrika
– year: 1992
– volume: 86,
  start-page: 843
  year: 1999
  end-page: 855
  article-title: Bias and efficiency loss due to misclassified responses in binary regression
  publication-title: Biometrika
– volume: 138,
  start-page: 1007
  year: 1993
  end-page: 1015
  article-title: Use of the positive predictive value to correct for disease misclassification in epidemiologic studies
  publication-title: American Journal of Epidemiology
– volume: 47,
  start-page: 111
  year: 2001
  end-page: 117
  article-title: Pathological features of prostate cancer detected in initial and repeat prostate biopsy: Results of the prospective European Prostate Cancer Detection study
  publication-title: Prostate
– volume: 4,
  start-page: 449
  year: 2003b
  end-page: 463
  article-title: A data‐analytic strategy for protein‐biomarker discovery: Profiling of high‐dimensional proteomic data for cancer detection
  publication-title: Biostatistics
– volume: 105,
  start-page: 488
  year: 1977
  end-page: 495
  article-title: Bias due to misclassification in the estimation of relative risk
  publication-title: American Journal of Epidemiology
– volume: 111,
  start-page: 503
  year: 1980
  end-page: 515
  article-title: Effects of misclassifications on statistical inferences in epidemiology
  publication-title: American Journal of Epidemiology
– volume: 48,
  start-page: 1835
  year: 2002
  end-page: 1843
  article-title: Boosted decision tree analysis of surface‐enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients
  publication-title: Clinical Chemistry
– volume: 39,
  start-page: 1
  year: 1977
  end-page: 38
  article-title: Maximum likelihood from incomplete data via the EM algorithm
  publication-title: Journal of the Royal Statistical Society, Series B (Methodological)
– volume: 8,
  start-page: 1095
  year: 1992
  end-page: 1106
  article-title: A review of methods for misclassified categorical data in epidemiology
  publication-title: Statistics in Medicine
– volume: 146,
  start-page: 195
  year: 1997
  end-page: 203
  article-title: Logistic regression when the outcome is measured with uncertainty
  publication-title: American Journal of Epidemiology
– volume: 28,
  start-page: 337
  year: 2000
  end-page: 407
  article-title: Additive logistic regression: A statistical view of boosting
  publication-title: Annals of Statistics
– volume: 43,
  start-page: 1001
  year: 1987
  end-page: 1012
  article-title: A general approach to analyzing epidemiologic data that contain misclassification errors
  publication-title: Biometrics
– volume: 58,
  start-page: 454
  year: 2002
  end-page: 458
  article-title: A simple Bayesian analysis of misclassified binary data with a validation substudy
  publication-title: Biometrics
– volume: 2003,
  start-page: 242
  year: 2003a
  end-page: 248
  article-title: An automated peak identification/calibration procedure for high‐dimensional protein measures from mass spectrometers
  publication-title: Journal of Biomedicine and Biotechnology
– volume: 124,
  start-page: 816
  year: 1986
  end-page: 825
  article-title: The effect of misclassification of disease status in follow‐up studies: Implications for selecting disease classification criteria
  publication-title: American Journal of Epidemiology
– ident: e_1_2_9_3_1
  doi: 10.1093/oxfordjournals.aje.a116805
– ident: e_1_2_9_27_1
  doi: 10.1093/biostatistics/4.3.449
– ident: e_1_2_9_5_1
  doi: 10.1093/biomet/82.2.315
– ident: e_1_2_9_16_1
  doi: 10.1093/oxfordjournals.aje.a009251
– ident: e_1_2_9_20_1
  doi: 10.1111/j.0006-341X.2002.00454.x
– ident: e_1_2_9_10_1
  doi: 10.2307/2532670
– volume-title: Signal Detection Theory and Psychophysics
  year: 1966
  ident: e_1_2_9_14_1
– ident: e_1_2_9_23_1
  doi: 10.1038/labinvest.3780122
– ident: e_1_2_9_15_1
  doi: 10.1007/978-0-387-21606-5
– volume: 62
  start-page: 3609
  year: 2002
  ident: e_1_2_9_2_1
  article-title: Serum protein fingerprinting coupled with a pattern‐matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men
  publication-title: Cancer Research
– volume: 48
  start-page: 1835
  year: 2002
  ident: e_1_2_9_21_1
  article-title: Boosted decision tree analysis of surface‐enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients
  publication-title: Clinical Chemistry
  doi: 10.1093/clinchem/48.10.1835
– ident: e_1_2_9_13_1
  doi: 10.1214/aos/1016218223
– ident: e_1_2_9_11_1
  doi: 10.2307/2531553
– ident: e_1_2_9_26_1
  doi: 10.1155/S111072430320927X
– ident: e_1_2_9_18_1
  doi: 10.1093/biomet/86.4.843
– ident: e_1_2_9_22_1
  doi: 10.1093/oxfordjournals.aje.a112930
– ident: e_1_2_9_12_1
  doi: 10.1111/j.1469-1809.1936.tb02137.x
– ident: e_1_2_9_6_1
  doi: 10.1093/oxfordjournals.aje.a112408
– ident: e_1_2_9_4_1
  doi: 10.1002/sim.4780080908
– ident: e_1_2_9_8_1
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– ident: e_1_2_9_17_1
  doi: 10.1002/0471725293
– ident: e_1_2_9_9_1
  doi: 10.1002/pros.1053
– ident: e_1_2_9_25_1
  doi: 10.1093/oxfordjournals.aje.a114458
– ident: e_1_2_9_24_1
  doi: 10.1111/1467-9868.00247
– ident: e_1_2_9_7_1
  doi: 10.2307/2531595
– ident: e_1_2_9_19_1
  doi: 10.1093/jnci/93.14.1054
SSID ssj0009502
Score 1.7831283
Snippet Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective...
SourceID proquest
pubmed
crossref
wiley
jstor
istex
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 199
SubjectTerms Algorithms
Artificial Intelligence
Biomarkers, Tumor - blood
Biometrics
Biometry
Biopsies
Blood Proteins - analysis
Datasets
Epidemiology
High-dimensional data
Humans
Learning disabilities
Logistic regression
Male
Mass Spectrometry
Misclassification
Prostate cancer
Prostatic hyperplasia
Prostatic Hyperplasia - blood
Prostatic Hyperplasia - diagnosis
Prostatic Neoplasms - blood
Prostatic Neoplasms - diagnosis
Proteomics
Test data
Training
Title Partially Supervised Learning Using an EM-Boosting Algorithm
URI https://api.istex.fr/ark:/67375/WNG-3XDZR4B9-D/fulltext.pdf
https://www.jstor.org/stable/3695568
https://onlinelibrary.wiley.com/doi/abs/10.1111%2Fj.0006-341X.2004.00156.x
https://www.ncbi.nlm.nih.gov/pubmed/15032790
https://www.proquest.com/docview/71739003
Volume 60
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Nb9QwEB2hIqRy4GMpJeUrB8QtK2_sOPEFqcu2FKQtqFCx4mI5tlPQbpNqsyu1nPgJ_EZ-CR4nm7Kohwoh5ZBDJrKdmfGz8_wG4AWjmc1UwiKiGYtYEYsoo4ZHbrLLiSl4Zgo8KDw-5AfH7N0kmbT8JzwL0-hDdBtuGBk-X2OAq7xeC3JMtZHLwhO_zOv7U8F9xJMDylFGf3QU_6G_Sxrh8NZkjdRz9YvWZqqbOOjnK9LiVXB0Hd366Wn_LkxXHWtYKdP-cpH39fe_NB__T8_vwZ0WxYa7jdvdhxu27MGtpq7lRQ9ujzsx2LoHmwhoGz3oB_DqA_qqms0uwo_LM0xUtTVhq_J6EnoGQ6jKcG_868fPYVXVSMsOd2cn1fzb4uvpFhzv7316fRC1NRwizWLGo6RIqRJWmHhgc51pSgw1CBId0Cq4TRnROs9IbBnP6EAJl36yzDAtjEOyTFv6EDbKqrSPICQK4anKRZILt8wq3IVV3ZkSSWw4yQJIV99L6lbgHOtszOTlQgcHTuLAYflNJv3AyfMABp3lWSPycQ2bl94lOgM1nyJJLk3k58M3kk5GX47YUMhRAFveZ7oHKRco-xbA85UPSRfS-J9GlbZa1hKJEbjBHMB241qXjUoIjVNBAuDeQa7dWjl8-37s7nb-1fAxbDacJWTfPYGNxXxpnzo4tsif-UD7DVWVIVg
linkProvider Wiley-Blackwell
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwEB6hVohy4LEUCK_mgLhl5Y2dxL4gddmWLXQXVFqx6sVyYqeghqTah9Ry4ifwG_kleJxsyqIeKoSUQw6ZyHZmxp-dz98AvGSUG64iFpCMsYDloQg41XFgJ7uU6DzmOseDwqNxPDxi7ybRpCkHhGdhan2IdsMNI8Plawxw3JBeiXLMtYFNwxO3zuu6Y8FdCyjXmcUduBIbHIR_KPCSWjq8sVmh9Vz9ppW5ah2H_XxJW7wKkK7iWzdB7d6FYtm1mpdy2l3M0272_S_Vx__U93twpwGy_nbteffhhik7cLMubXnRgdujVg921oENxLS1JPQDeP0R3VUVxYX_aXGGuWpmtN8IvZ74jsTgq9LfGf368bNfVTNkZvvbxUk1_Tr_8m0TjnZ3Dt8Mg6aMQ5CxkMVBlCdUCSN02DNpxjNKNNWIEy3WymOTMJJlKSehYTGnPSVsBuJcs0xoC2ZZZuhDWCur0jwGnyhEqCoVUSrsSiu3FxZ2Z0pEoY4J9yBZfjCZNRrnWGqjkJdrHRw4iQOHFTiZdAMnzz3otZZntc7HNWxeOZ9oDdT0FHlySSQ_j99KOhkcH7C-kAMPNp3TtA_SWKDymwdbSyeSNqrxV40qTbWYSeRG4B6zB49q37psVERomAjiQew85Nqtlf29DyN79-RfDbfg1vBwtC_398bvn8JGTWFCMt4zWJtPF-a5RWfz9IWLut-JPCV3
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwEB6hVqBy4LEUCK_mgLhllY2dhy9IXbZLC-xSFSpWXCzHdtpql2S1D6nlxE_gN_JL8NjZlEU9VAgphxwykWPPjD87n78BeElJpjMR0yCUlAa0iFiQEZUEZrLLQ1UkmSrwoPBgmOwf03ejeFTzn_AsjNOHaDbcMDJsvsYAn6piLcgx1QYmC4_sMq9tTwW3DZ7cpIkBFgiQjqI_BHhDpxxe26yxeq5-09pUtYm9fr5iLV6FR9fhrZ2f-ndhvPoyR0sZt5eLvC2__yX6-H8-_R7cqWGsv-v87j7c0GULbrrClhctuD1o1GDnLdhCROsEoR_A60N0VjGZXPifllPMVHOt_Frm9cS3FAZflP7e4NePn92qmiMv29-dnFSzs8Xpt2047u99frMf1EUcAkkjmgRxkRLBNFNRR-cykyRURCFKNEirSHRKQynzLIw0TTLSEczknyxTVDJloCyVmjyEjbIq9WPwQ4H4VOQszplZZxXmwrLuVLA4UmbIPUhX48VlrXCOhTYm_HKlgx3HseOw_ibltuP4uQedxnLqVD6uYfPKukRjIGZjZMmlMf8yfMvJqPf1iHYZ73mwbX2meZAkDHXfPNhZ-RA3MY0_akSpq-WcIzMCd5g9eORc67JRcUiilIUeJNZBrt1a3j34ODB3T_7VcAduHfb6_MPB8P1T2HL8JWTiPYONxWypnxtotshf2Jj7Dd45JCY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Partially+supervised+learning+using+an+EM-boosting+algorithm&rft.jtitle=Biometrics&rft.au=Yasui%2C+Yutaka&rft.au=Pepe%2C+Margaret&rft.au=Hsu%2C+Li&rft.au=Adam%2C+Bao-Ling&rft.date=2004-03-01&rft.issn=0006-341X&rft.volume=60&rft.issue=1&rft.spage=199&rft_id=info:doi/10.1111%2Fj.0006-341X.2004.00156.x&rft_id=info%3Apmid%2F15032790&rft.externalDocID=15032790
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0006-341X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0006-341X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0006-341X&client=summon