OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS
With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input an...
Saved in:
Published in | The annals of applied statistics Vol. 18; no. 3; p. 1947 |
---|---|
Main Authors | , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
01.09.2024
|
Subjects | |
Online Access | Get more information |
ISSN | 1932-6157 |
DOI | 10.1214/23-aoas1865 |
Cover
Loading…
Abstract | With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multi-faceted cluster structures that can be defined by different sets of gene. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a pre-specified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association. |
---|---|
AbstractList | With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multi-faceted cluster structures that can be defined by different sets of gene. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a pre-specified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association. |
Author | Fang, Yusi Li, Yujia Tseng, George C Zong, Wei Tang, Lu Wang, Wenjia Liu, Peng Ren, Zhao Celedón, Juan C Oesterreich, Steffi |
Author_xml | – sequence: 1 givenname: Yujia surname: Li fullname: Li, Yujia organization: University of Pittsburgh – sequence: 2 givenname: Peng surname: Liu fullname: Liu, Peng organization: University of Pittsburgh – sequence: 3 givenname: Wenjia surname: Wang fullname: Wang, Wenjia organization: University of Pittsburgh – sequence: 4 givenname: Wei surname: Zong fullname: Zong, Wei organization: University of Pittsburgh – sequence: 5 givenname: Yusi surname: Fang fullname: Fang, Yusi organization: University of Pittsburgh – sequence: 6 givenname: Zhao surname: Ren fullname: Ren, Zhao organization: University of Pittsburgh – sequence: 7 givenname: Lu surname: Tang fullname: Tang, Lu organization: University of Pittsburgh – sequence: 8 givenname: Juan C surname: Celedón fullname: Celedón, Juan C organization: University of Pittsburgh – sequence: 9 givenname: Steffi surname: Oesterreich fullname: Oesterreich, Steffi organization: University of Pittsburgh – sequence: 10 givenname: George C surname: Tseng fullname: Tseng, George C organization: University of Pittsburgh |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40740430$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j7tOwzAYRj0U0QtM7MgvEPDvW50xTUxqSO2ocUCdKjdJJRC90MDA21MJmL7lnCN9YzTYH_YdQjdA7oACv6csCofQg5JigEYQMxpJENMhGvf9GyGCKw6XaMjJlBPOyAh9uNqnbqGjvDaZznBmKp1UGlf1zK9KY3M8W-FcW71MvHnWeOEyXeDEZvhFm3zuz8qjM9bjwjzpwsydy7Cx2C8TW6VLU3q3MClOyrIw6bngbHWFLrbhve-u_3aC6gft03lUuPzMFFHDFP2M2rhtKJetCpKRuAtNI5sOlFCcbmLGGEjoYkobCdBKvgmBccIEELUFIQhldIJuf7vHr82ua9fH0-sunL7X_9_pD8w4UKU |
CitedBy_id | crossref_primary_10_1093_biostatistics_kxae020 |
ContentType | Journal Article |
DBID | NPM |
DOI | 10.1214/23-aoas1865 |
DatabaseName | PubMed |
DatabaseTitle | PubMed |
DatabaseTitleList | PubMed |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Mathematics |
ExternalDocumentID | 40740430 |
Genre | Journal Article |
GrantInformation_xml | – fundername: NLM NIH HHS grantid: R01 LM014142 – fundername: NLM NIH HHS grantid: R21 LM012752 |
GroupedDBID | 123 23M 2AX 6J9 AAWIL ABAWQ ABBHK ABFAN ABQDR ABXSQ ABYWD ABZEH ACDIW ACGFO ACHJO ACMTB ACTMH ADODI ADULT AELLO AENEX AETVE AEUPB AFFOW AFVYC AGLNM AIHAF AKBRZ ALMA_UNASSIGNED_HOLDINGS ALRMG AS~ CS3 DQDLB DSRWC EBS ECEWR EJD F5P FEDTE GIFXF GR0 HDK HQ6 HVGLF IPSME J9A JAA JAAYA JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JMS JPL JST NPM OK1 P2P PUASD RBU RNS RPE SA0 SJN TN5 WHG WS9 |
ID | FETCH-LOGICAL-c382t-d9dc246d8a6309eacc6ce185842b9333161e922c611d64baa34035108f1550232 |
ISSN | 1932-6157 |
IngestDate | Mon Aug 04 01:30:55 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Keywords | weighted joint likelihood omics data generative model disease subtyping high-dimensional cluster analysis |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c382t-d9dc246d8a6309eacc6ce185842b9333161e922c611d64baa34035108f1550232 |
OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/12309773 |
PMID | 40740430 |
ParticipantIDs | pubmed_primary_40740430 |
PublicationCentury | 2000 |
PublicationDate | 2024-09-01 |
PublicationDateYYYYMMDD | 2024-09-01 |
PublicationDate_xml | – month: 09 year: 2024 text: 2024-09-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | The annals of applied statistics |
PublicationTitleAlternate | Ann Appl Stat |
PublicationYear | 2024 |
SSID | ssj0054841 |
Score | 2.3582358 |
Snippet | With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 1947 |
Title | OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40740430 |
Volume | 18 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JbtswECWcFijSQ9F9L3joTWCrhZapo22xsdJYMmKpiXsJKIkCUqB2itiXfkE_u0NSW5IWXS6CQBqCpHkazqNn3iD0tmSBW1GZE2lLj1C_YIT5I0GqEXx5qi_OyFa1w_PYn2X08HR4Ohj86GUt7bb5u-L7L-tK_seqMAZ2VVWy_2DZ9qIwAOdgXziCheH4VzZOsnSazDk5yKKQh1YYLfl4ya1lNklXC7UNNVlZJjctjT5xa56E_EjrSZ3w6GAGPss6TKI4BQ_2kR9FsyQJrSi20uNxvJweR4tUdTi0xotFW27cD2UVwEQrvyzqaFYVKBnt5zbVR-cLrHZfzkU3tDO5wfWyqTf0jdM5keveDz9vmtHz_vaES9v8q8ajQoAI_NSoUN90uR0j1_7TCYz-5g3HDrBRtSseERtx6TDTX6Jn4ouv2sZAT5VckP3n2Wsq283UHtoDvqEaqKpdH7OiA6nTHVDbR6nrPOGe3vfuaB_daa5yjaPoWCW9j-7VJAOPDWIeoIFcP0R3561C7-Uj9O0qdnCNHdxiB09WuMMO1tjBgB3cYAdr7OAOOziK8VXs4D52HqPsA0-nM1K33yCFx9wtKYOycKlfMuF7dgALdOEXEsI7Rt088DwPuIIMXLfwHaf0aS6ER9Xf0jarFO2FSP0JurXerOUzhMWQKVmlqqSVoAEVuSwD4Kl2DuR36JTuc_TUvK-zC6Oxcta8yRe_nXmJ9ju8vUK3K8C7fA0R4jZ_o833ExZrUIA |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=OUTCOME-GUIDED+DISEASE+SUBTYPING+BY+GENERATIVE+MODEL+AND+WEIGHTED+JOINT+LIKELIHOOD+IN+TRANSCRIPTOMIC+APPLICATIONS&rft.jtitle=The+annals+of+applied+statistics&rft.au=Li%2C+Yujia&rft.au=Liu%2C+Peng&rft.au=Wang%2C+Wenjia&rft.au=Zong%2C+Wei&rft.date=2024-09-01&rft.issn=1932-6157&rft.volume=18&rft.issue=3&rft.spage=1947&rft_id=info:doi/10.1214%2F23-aoas1865&rft_id=info%3Apmid%2F40740430&rft_id=info%3Apmid%2F40740430&rft.externalDocID=40740430 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6157&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6157&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6157&client=summon |