CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis
Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the...
Saved in:
Published in | Computational Mathematics Modeling in Cancer Analysis Vol. 13574; pp. 68 - 77 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer
2022
Springer Nature Switzerland |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1% $$94.1\%$$ balanced accuracy) and all-vs-one cancer type detection (78.0% $$78.0\%$$ average maxF1 $$\max F_1$$ ). |
---|---|
AbstractList | Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1% $$94.1\%$$ balanced accuracy) and all-vs-one cancer type detection (78.0% $$78.0\%$$ average maxF1 $$\max F_1$$ ). |
Author | Bloch, Natasha Mejía, Gabriel Arbelaez, Pablo |
Author_xml | – sequence: 1 givenname: Gabriel surname: Mejía fullname: Mejía, Gabriel email: gm.mejia@uniandes.edu.co – sequence: 2 givenname: Natasha surname: Bloch fullname: Bloch, Natasha – sequence: 3 givenname: Pablo surname: Arbelaez fullname: Arbelaez, Pablo |
BookMark | eNpVkMtOwzAQRQ0URFv6BWzyA4YZ2_GDHX0AlSrBonvLSd0SSJ0Qp_-P27JBs5jRHZ3R6IzIIDTBE3KP8IAA6tEoTTkFjhQVk5Jyqy7IJKU8ZaeIX5IhSkTKuTBX_3a5HJAhcGDUKMFvyAi5kJgLLdktmcT4BQBMcQQFQzKduTBfLZ6yZV0f9lVwfRV22bRy0cesCtm6cyGWXdX2zb4qsw8XaCJK32Xzyu1CE6t4R663ro5-8tfHZP2yWM_e6Or9dTl7XtGWCeipQrlVpWJFgcZ5VmhmtJfag0nltUGX53qrS8F9KTxIpouNEwZY-l6wDR8TPJ-NbZd-9J0tmuY7WgR7dGaTActtcmBPgtKsEsPOTNs1Pwcfe-uPUOlD37m6_HRt77toFQJDZazKrTL8F9cyaZE |
ContentType | Book Chapter |
Copyright | The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 |
Copyright_xml | – notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 |
DBID | FFUUA |
DEWEY | 616.99400113 |
DOI | 10.1007/978-3-031-17266-3_7 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Applied Sciences Computer Science |
EISBN | 9783031172663 3031172663 |
EISSN | 1611-3349 |
Editor | Wu, Jia Zhang, Fa Zaki, Nazar Qin, Wenjian Yang, Fan |
Editor_xml | – sequence: 1 fullname: Wu, Jia – sequence: 2 fullname: Zhang, Fa – sequence: 3 fullname: Qin, Wenjian – sequence: 4 fullname: Yang, Fan – sequence: 5 fullname: Zaki, Nazar |
EndPage | 77 |
ExternalDocumentID | EBC7102179_75_79 |
GroupedDBID | 38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR AIYYB ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z7R Z7U Z7X Z81 Z82 Z83 Z84 Z87 Z88 -DT -~X 29L 2HA 2HV ACGFS ADCXD EJD F5P LAS LDH P2P RSU ~02 |
ID | FETCH-LOGICAL-p240t-716f7c72bb19ae2b8298e68e09090e891a558f8c43ec4e0628bda490234642d3 |
ISBN | 9783031172656 3031172655 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:15:29 EDT 2025 Tue Jul 22 07:50:38 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | TA1501-1820 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p240t-716f7c72bb19ae2b8298e68e09090e891a558f8c43ec4e0628bda490234642d3 |
Notes | Original Abstract: Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.1\%$$\end{document} balanced accuracy) and all-vs-one cancer type detection (78.0%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$78.0\%$$\end{document} average maxF1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\max F_1$$\end{document}). |
OCLC | 1346154862 |
PQID | EBC7102179_75_79 |
PageCount | 10 |
ParticipantIDs | springer_books_10_1007_978_3_031_17266_3_7 proquest_ebookcentralchapters_7102179_75_79 |
PublicationCentury | 2000 |
PublicationDate | 2022 20220922 |
PublicationDateYYYYMMDD | 2022-01-01 2022-09-22 |
PublicationDate_xml | – year: 2022 text: 2022 |
PublicationDecade | 2020 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | First International Workshop, CMMCA 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings |
PublicationTitle | Computational Mathematics Modeling in Cancer Analysis |
PublicationYear | 2022 |
Publisher | Springer Springer Nature Switzerland |
Publisher_xml | – name: Springer – name: Springer Nature Switzerland |
RelatedPersons | Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti |
RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti |
SSID | ssj0002731070 ssj0002792 |
Score | 2.0424252 |
Snippet | Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 68 |
SubjectTerms | Cancer classification Cancer detection GTEx Machine learning Multinomial logistic regression TCGA |
Title | CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7102179&ppg=79&c=UERG http://link.springer.com/10.1007/978-3-031-17266-3_7 |
Volume | 13574 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3La9swGBdrBqXssK0P1u6BDjt1qMSWJUu7LWlGFpqd0tGbkGQZAsUpTXbZX99Pkh-xt0uHwRjHUsT3E99L3wOhz9xym7ExJaCqapIJyYimVBNnOTc2AaGuvUN_-ZPPb7PFHbvr-mOG7JKdubJ__plX8j-owjvA1WfJPgPZdlJ4Ac-AL9wBYbgPlN--mzXWFQj9GBpf3rItv7oN_c3u61yVqUf1sa09sr9D4Kfrm5l3Cfzw7Y7X3i8IgyZrEGwhSjbIscBVNiGCXleknu46Buite06DNB04DRqnYc-YBGGWgDrDY6HvljtSFtvo_MVr98MrYCjxYzmhKu9ES3OcHvvFDApbzyZTr94AR1A5U7k8QAe5YCP08ttscfOrdZSBfgUm6tjn5TQLZLFyUrfgtpxUrBg8WE_PeBicdwc1YvUGvfKpJdjnfMAS36IXrjpGr2tDANdsdnuMDpd1xMMJmkSUvuJ9jHDECK8r3McIdxjhFqNTtPo-W03npO57QR5Av9oRMGHL3OapMYnULjUilcJx4cYSLidkohkTpbAZdTZzPgnWFDqToH1lYE0W9AyNqk3l3iFswBimhpWG2zIbl6VwiZM6FYUsXAH_cY6-NJRR4XC-jgi2kQ5b1QPoHF02xFP-461qal4D0RVVQHQViA7P-cWzpn6Pjrpt-gGNdo-_3UfQ9nbmU70fngCgik75 |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computational+Mathematics+Modeling+in+Cancer+Analysis&rft.atitle=CanDLE%3A+Illuminating+Biases+in+Transcriptomic+Pan-Cancer+Diagnosis&rft.date=2022-01-01&rft.pub=Springer&rft.isbn=9783031172656&rft.volume=13574&rft_id=info:doi/10.1007%2F978-3-031-17266-3_7&rft.externalDBID=79&rft.externalDocID=EBC7102179_75_79 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7102179-l.jpg |