CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis

Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the...

Full description

Saved in:

Bibliographic Details
Published in	Computational Mathematics Modeling in Cancer Analysis Vol. 13574; pp. 68 - 77
Main Authors	Mejía, Gabriel, Bloch, Natasha, Arbelaez, Pablo
Format	Book Chapter
Language	English
Published	Switzerland Springer 2022 Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Subjects	Cancer classification Cancer detection GTEx Machine learning Multinomial logistic regression TCGA
Online Access	Get full text

Cover

Loading…

Abstract	Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1% $$94.1\%$$ balanced accuracy) and all-vs-one cancer type detection (78.0% $$78.0\%$$ average maxF1 $$\max F_1$$ ).
AbstractList	Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1% $$94.1\%$$ balanced accuracy) and all-vs-one cancer type detection (78.0% $$78.0\%$$ average maxF1 $$\max F_1$$ ).
Author	Bloch, Natasha Mejía, Gabriel Arbelaez, Pablo
Author_xml	– sequence: 1 givenname: Gabriel surname: Mejía fullname: Mejía, Gabriel email: gm.mejia@uniandes.edu.co – sequence: 2 givenname: Natasha surname: Bloch fullname: Bloch, Natasha – sequence: 3 givenname: Pablo surname: Arbelaez fullname: Arbelaez, Pablo
BookMark	eNpVkMtOwzAQRQ0URFv6BWzyA4YZ2_GDHX0AlSrBonvLSd0SSJ0Qp_-P27JBs5jRHZ3R6IzIIDTBE3KP8IAA6tEoTTkFjhQVk5Jyqy7IJKU8ZaeIX5IhSkTKuTBX_3a5HJAhcGDUKMFvyAi5kJgLLdktmcT4BQBMcQQFQzKduTBfLZ6yZV0f9lVwfRV22bRy0cesCtm6cyGWXdX2zb4qsw8XaCJK32Xzyu1CE6t4R663ro5-8tfHZP2yWM_e6Or9dTl7XtGWCeipQrlVpWJFgcZ5VmhmtJfag0nltUGX53qrS8F9KTxIpouNEwZY-l6wDR8TPJ-NbZd-9J0tmuY7WgR7dGaTActtcmBPgtKsEsPOTNs1Pwcfe-uPUOlD37m6_HRt77toFQJDZazKrTL8F9cyaZE
ContentType	Book Chapter
Copyright	The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
Copyright_xml	– notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
DBID	FFUUA
DEWEY	616.99400113
DOI	10.1007/978-3-031-17266-3_7
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Applied Sciences Computer Science
EISBN	9783031172663 3031172663
EISSN	1611-3349
Editor	Wu, Jia Zhang, Fa Zaki, Nazar Qin, Wenjian Yang, Fan
Editor_xml	– sequence: 1 fullname: Wu, Jia – sequence: 2 fullname: Zhang, Fa – sequence: 3 fullname: Qin, Wenjian – sequence: 4 fullname: Yang, Fan – sequence: 5 fullname: Zaki, Nazar
EndPage	77
ExternalDocumentID	EBC7102179_75_79
GroupedDBID	38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR AIYYB ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z7R Z7U Z7X Z81 Z82 Z83 Z84 Z87 Z88 -DT -~X 29L 2HA 2HV ACGFS ADCXD EJD F5P LAS LDH P2P RSU ~02
ID	FETCH-LOGICAL-p240t-716f7c72bb19ae2b8298e68e09090e891a558f8c43ec4e0628bda490234642d3
ISBN	9783031172656 3031172655
ISSN	0302-9743
IngestDate	Tue Jul 29 20:15:29 EDT 2025 Tue Jul 22 07:50:38 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	TA1501-1820
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p240t-716f7c72bb19ae2b8298e68e09090e891a558f8c43ec4e0628bda490234642d3
Notes	Original Abstract: Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.1\%$$\end{document} balanced accuracy) and all-vs-one cancer type detection (78.0%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$78.0\%$$\end{document} average maxF1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\max F_1$$\end{document}).
OCLC	1346154862
PQID	EBC7102179_75_79
PageCount	10
ParticipantIDs	springer_books_10_1007_978_3_031_17266_3_7 proquest_ebookcentralchapters_7102179_75_79
PublicationCentury	2000
PublicationDate	2022 20220922
PublicationDateYYYYMMDD	2022-01-01 2022-09-22
PublicationDate_xml	– year: 2022 text: 2022
PublicationDecade	2020
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	First International Workshop, CMMCA 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings
PublicationTitle	Computational Mathematics Modeling in Cancer Analysis
PublicationYear	2022
Publisher	Springer Springer Nature Switzerland
Publisher_xml	– name: Springer – name: Springer Nature Switzerland
RelatedPersons	Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti
RelatedPersons_xml	– sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti
SSID	ssj0002731070 ssj0002792
Score	2.0424252
Snippet	Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task...
SourceID	springer proquest
SourceType	Publisher
StartPage	68
SubjectTerms	Cancer classification Cancer detection GTEx Machine learning Multinomial logistic regression TCGA
Title	CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7102179&ppg=79&c=UERG http://link.springer.com/10.1007/978-3-031-17266-3_7
Volume	13574
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3La9swGBdrBqXssK0P1u6BDjt1qMSWJUu7LWlGFpqd0tGbkGQZAsUpTXbZX99Pkh-xt0uHwRjHUsT3E99L3wOhz9xym7ExJaCqapIJyYimVBNnOTc2AaGuvUN_-ZPPb7PFHbvr-mOG7JKdubJ__plX8j-owjvA1WfJPgPZdlJ4Ac-AL9wBYbgPlN--mzXWFQj9GBpf3rItv7oN_c3u61yVqUf1sa09sr9D4Kfrm5l3Cfzw7Y7X3i8IgyZrEGwhSjbIscBVNiGCXleknu46Buite06DNB04DRqnYc-YBGGWgDrDY6HvljtSFtvo_MVr98MrYCjxYzmhKu9ES3OcHvvFDApbzyZTr94AR1A5U7k8QAe5YCP08ttscfOrdZSBfgUm6tjn5TQLZLFyUrfgtpxUrBg8WE_PeBicdwc1YvUGvfKpJdjnfMAS36IXrjpGr2tDANdsdnuMDpd1xMMJmkSUvuJ9jHDECK8r3McIdxjhFqNTtPo-W03npO57QR5Av9oRMGHL3OapMYnULjUilcJx4cYSLidkohkTpbAZdTZzPgnWFDqToH1lYE0W9AyNqk3l3iFswBimhpWG2zIbl6VwiZM6FYUsXAH_cY6-NJRR4XC-jgi2kQ5b1QPoHF02xFP-461qal4D0RVVQHQViA7P-cWzpn6Pjrpt-gGNdo-_3UfQ9nbmU70fngCgik75
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computational+Mathematics+Modeling+in+Cancer+Analysis&rft.atitle=CanDLE%3A+Illuminating+Biases+in+Transcriptomic+Pan-Cancer+Diagnosis&rft.date=2022-01-01&rft.pub=Springer&rft.isbn=9783031172656&rft.volume=13574&rft_id=info:doi/10.1007%2F978-3-031-17266-3_7&rft.externalDBID=79&rft.externalDocID=EBC7102179_75_79
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7102179-l.jpg