OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS

With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input an...

Full description

Saved in:
Bibliographic Details
Published inThe annals of applied statistics Vol. 18; no. 3; p. 1947
Main Authors Li, Yujia, Liu, Peng, Wang, Wenjia, Zong, Wei, Fang, Yusi, Ren, Zhao, Tang, Lu, Celedón, Juan C, Oesterreich, Steffi, Tseng, George C
Format Journal Article
LanguageEnglish
Published United States 01.09.2024
Subjects
Online AccessGet more information
ISSN1932-6157
DOI10.1214/23-aoas1865

Cover

Loading…
Abstract With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multi-faceted cluster structures that can be defined by different sets of gene. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a pre-specified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association.
AbstractList With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multi-faceted cluster structures that can be defined by different sets of gene. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a pre-specified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association.
Author Fang, Yusi
Li, Yujia
Tseng, George C
Zong, Wei
Tang, Lu
Wang, Wenjia
Liu, Peng
Ren, Zhao
Celedón, Juan C
Oesterreich, Steffi
Author_xml – sequence: 1
  givenname: Yujia
  surname: Li
  fullname: Li, Yujia
  organization: University of Pittsburgh
– sequence: 2
  givenname: Peng
  surname: Liu
  fullname: Liu, Peng
  organization: University of Pittsburgh
– sequence: 3
  givenname: Wenjia
  surname: Wang
  fullname: Wang, Wenjia
  organization: University of Pittsburgh
– sequence: 4
  givenname: Wei
  surname: Zong
  fullname: Zong, Wei
  organization: University of Pittsburgh
– sequence: 5
  givenname: Yusi
  surname: Fang
  fullname: Fang, Yusi
  organization: University of Pittsburgh
– sequence: 6
  givenname: Zhao
  surname: Ren
  fullname: Ren, Zhao
  organization: University of Pittsburgh
– sequence: 7
  givenname: Lu
  surname: Tang
  fullname: Tang, Lu
  organization: University of Pittsburgh
– sequence: 8
  givenname: Juan C
  surname: Celedón
  fullname: Celedón, Juan C
  organization: University of Pittsburgh
– sequence: 9
  givenname: Steffi
  surname: Oesterreich
  fullname: Oesterreich, Steffi
  organization: University of Pittsburgh
– sequence: 10
  givenname: George C
  surname: Tseng
  fullname: Tseng, George C
  organization: University of Pittsburgh
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40740430$$D View this record in MEDLINE/PubMed
BookMark eNo1j7tOwzAYRj0U0QtM7MgvEPDvW50xTUxqSO2ocUCdKjdJJRC90MDA21MJmL7lnCN9YzTYH_YdQjdA7oACv6csCofQg5JigEYQMxpJENMhGvf9GyGCKw6XaMjJlBPOyAh9uNqnbqGjvDaZznBmKp1UGlf1zK9KY3M8W-FcW71MvHnWeOEyXeDEZvhFm3zuz8qjM9bjwjzpwsydy7Cx2C8TW6VLU3q3MClOyrIw6bngbHWFLrbhve-u_3aC6gft03lUuPzMFFHDFP2M2rhtKJetCpKRuAtNI5sOlFCcbmLGGEjoYkobCdBKvgmBccIEELUFIQhldIJuf7vHr82ua9fH0-sunL7X_9_pD8w4UKU
CitedBy_id crossref_primary_10_1093_biostatistics_kxae020
ContentType Journal Article
DBID NPM
DOI 10.1214/23-aoas1865
DatabaseName PubMed
DatabaseTitle PubMed
DatabaseTitleList PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Mathematics
ExternalDocumentID 40740430
Genre Journal Article
GrantInformation_xml – fundername: NLM NIH HHS
  grantid: R01 LM014142
– fundername: NLM NIH HHS
  grantid: R21 LM012752
GroupedDBID 123
23M
2AX
6J9
AAWIL
ABAWQ
ABBHK
ABFAN
ABQDR
ABXSQ
ABYWD
ABZEH
ACDIW
ACGFO
ACHJO
ACMTB
ACTMH
ADODI
ADULT
AELLO
AENEX
AETVE
AEUPB
AFFOW
AFVYC
AGLNM
AIHAF
AKBRZ
ALMA_UNASSIGNED_HOLDINGS
ALRMG
AS~
CS3
DQDLB
DSRWC
EBS
ECEWR
EJD
F5P
FEDTE
GIFXF
GR0
HDK
HQ6
HVGLF
IPSME
J9A
JAA
JAAYA
JBMMH
JBZCM
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JMS
JPL
JST
NPM
OK1
P2P
PUASD
RBU
RNS
RPE
SA0
SJN
TN5
WHG
WS9
ID FETCH-LOGICAL-c382t-d9dc246d8a6309eacc6ce185842b9333161e922c611d64baa34035108f1550232
ISSN 1932-6157
IngestDate Mon Aug 04 01:30:55 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords weighted joint likelihood
omics data
generative model
disease subtyping
high-dimensional cluster analysis
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c382t-d9dc246d8a6309eacc6ce185842b9333161e922c611d64baa34035108f1550232
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/12309773
PMID 40740430
ParticipantIDs pubmed_primary_40740430
PublicationCentury 2000
PublicationDate 2024-09-01
PublicationDateYYYYMMDD 2024-09-01
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle The annals of applied statistics
PublicationTitleAlternate Ann Appl Stat
PublicationYear 2024
SSID ssj0054841
Score 2.3582358
Snippet With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for...
SourceID pubmed
SourceType Index Database
StartPage 1947
Title OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS
URI https://www.ncbi.nlm.nih.gov/pubmed/40740430
Volume 18
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JbtswECWcFijSQ9F9L3joTWCrhZapo22xsdJYMmKpiXsJKIkCUqB2itiXfkE_u0NSW5IWXS6CQBqCpHkazqNn3iD0tmSBW1GZE2lLj1C_YIT5I0GqEXx5qi_OyFa1w_PYn2X08HR4Ohj86GUt7bb5u-L7L-tK_seqMAZ2VVWy_2DZ9qIwAOdgXziCheH4VzZOsnSazDk5yKKQh1YYLfl4ya1lNklXC7UNNVlZJjctjT5xa56E_EjrSZ3w6GAGPss6TKI4BQ_2kR9FsyQJrSi20uNxvJweR4tUdTi0xotFW27cD2UVwEQrvyzqaFYVKBnt5zbVR-cLrHZfzkU3tDO5wfWyqTf0jdM5keveDz9vmtHz_vaES9v8q8ajQoAI_NSoUN90uR0j1_7TCYz-5g3HDrBRtSseERtx6TDTX6Jn4ouv2sZAT5VckP3n2Wsq283UHtoDvqEaqKpdH7OiA6nTHVDbR6nrPOGe3vfuaB_daa5yjaPoWCW9j-7VJAOPDWIeoIFcP0R3561C7-Uj9O0qdnCNHdxiB09WuMMO1tjBgB3cYAdr7OAOOziK8VXs4D52HqPsA0-nM1K33yCFx9wtKYOycKlfMuF7dgALdOEXEsI7Rt088DwPuIIMXLfwHaf0aS6ER9Xf0jarFO2FSP0JurXerOUzhMWQKVmlqqSVoAEVuSwD4Kl2DuR36JTuc_TUvK-zC6Oxcta8yRe_nXmJ9ju8vUK3K8C7fA0R4jZ_o833ExZrUIA
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=OUTCOME-GUIDED+DISEASE+SUBTYPING+BY+GENERATIVE+MODEL+AND+WEIGHTED+JOINT+LIKELIHOOD+IN+TRANSCRIPTOMIC+APPLICATIONS&rft.jtitle=The+annals+of+applied+statistics&rft.au=Li%2C+Yujia&rft.au=Liu%2C+Peng&rft.au=Wang%2C+Wenjia&rft.au=Zong%2C+Wei&rft.date=2024-09-01&rft.issn=1932-6157&rft.volume=18&rft.issue=3&rft.spage=1947&rft_id=info:doi/10.1214%2F23-aoas1865&rft_id=info%3Apmid%2F40740430&rft_id=info%3Apmid%2F40740430&rft.externalDocID=40740430
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6157&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6157&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6157&client=summon