Model-Based Clustering and Prediction With Mixed Measurements Involving Surrogate Classifiers
Identification of underlying subpopulations to account for unobserved heterogeneity in the population is a challenging statistical problem, mainly because no explicit information about the latent classes is available. Although latent class analysis via finite mixture models is often used successfull...
Saved in:
Published in | Statistics in biopharmaceutical research Vol. 14; no. 3; pp. 368 - 379 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Taylor & Francis
03.08.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Identification of underlying subpopulations to account for unobserved heterogeneity in the population is a challenging statistical problem, mainly because no explicit information about the latent classes is available. Although latent class analysis via finite mixture models is often used successfully to probabilistically identify subpopulations in applications, it often fails with data for which such subpopulations exhibit high latency. Borrowing strength from readily accessible auxiliary classifiers, even when subject to misclassification, may yield improved results in such settings. We develop in this article a joint modeling approach that combines data from multiple sources, including observed characteristics that are often used alone for clustering and classification, as well as results based on imperfect surrogate classifiers, to better identify the latent classes for more accurate classification and prediction. We outline maximum likelihood estimation for the joint model using the EM algorithm, and we show empirically via simulations that our methodology yields better estimates of the underlying latent class distributions than those obtained by ignoring the auxiliary information, while providing joint assessments of the surrogate classifiers. The advantages are significant when there is high latency and the surrogate classifiers are at least moderately accurate. We use real diagnostic data on dry eye disease, for which no gold standard is available, to illustrate our methodology. |
---|---|
ISSN: | 1946-6315 1946-6315 |
DOI: | 10.1080/19466315.2020.1863257 |