Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction
High-quality data accumulation is now becoming ubiquitous in the health domain. There is increasing opportunity to exploit rich data from normal subjects to improve supervised estimators in specific diseases with notorious data scarcity. We demonstrate that low-dimensional embedding spaces can be de...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
12.10.2021
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2110.06135 |
Cover
Loading…
Summary: | High-quality data accumulation is now becoming ubiquitous in the health
domain. There is increasing opportunity to exploit rich data from normal
subjects to improve supervised estimators in specific diseases with notorious
data scarcity. We demonstrate that low-dimensional embedding spaces can be
derived from the UK Biobank population dataset and used to enhance data-scarce
prediction of health indicators, lifestyle and demographic characteristics.
Phenotype predictions facilitated by Variational Autoencoder manifolds
typically scaled better with increasing unlabeled data than dimensionality
reduction by PCA or Isomap. Performances gains from semisupervison approaches
will probably become an important ingredient for various medical data science
applications. |
---|---|
DOI: | 10.48550/arxiv.2110.06135 |