CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis

Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the...

Full description

Saved in:
Bibliographic Details
Published inComputational Mathematics Modeling in Cancer Analysis Vol. 13574; pp. 68 - 77
Main Authors Mejía, Gabriel, Bloch, Natasha, Arbelaez, Pablo
Format Book Chapter
LanguageEnglish
Published Switzerland Springer 2022
Springer Nature Switzerland
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1% $$94.1\%$$ balanced accuracy) and all-vs-one cancer type detection (78.0% $$78.0\%$$ average maxF1 $$\max F_1$$ ).
AbstractList Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1% $$94.1\%$$ balanced accuracy) and all-vs-one cancer type detection (78.0% $$78.0\%$$ average maxF1 $$\max F_1$$ ).
Author Bloch, Natasha
Mejía, Gabriel
Arbelaez, Pablo
Author_xml – sequence: 1
  givenname: Gabriel
  surname: Mejía
  fullname: Mejía, Gabriel
  email: gm.mejia@uniandes.edu.co
– sequence: 2
  givenname: Natasha
  surname: Bloch
  fullname: Bloch, Natasha
– sequence: 3
  givenname: Pablo
  surname: Arbelaez
  fullname: Arbelaez, Pablo
BookMark eNpVkMtOwzAQRQ0URFv6BWzyA4YZ2_GDHX0AlSrBonvLSd0SSJ0Qp_-P27JBs5jRHZ3R6IzIIDTBE3KP8IAA6tEoTTkFjhQVk5Jyqy7IJKU8ZaeIX5IhSkTKuTBX_3a5HJAhcGDUKMFvyAi5kJgLLdktmcT4BQBMcQQFQzKduTBfLZ6yZV0f9lVwfRV22bRy0cesCtm6cyGWXdX2zb4qsw8XaCJK32Xzyu1CE6t4R663ro5-8tfHZP2yWM_e6Or9dTl7XtGWCeipQrlVpWJFgcZ5VmhmtJfag0nltUGX53qrS8F9KTxIpouNEwZY-l6wDR8TPJ-NbZd-9J0tmuY7WgR7dGaTActtcmBPgtKsEsPOTNs1Pwcfe-uPUOlD37m6_HRt77toFQJDZazKrTL8F9cyaZE
ContentType Book Chapter
Copyright The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
Copyright_xml – notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
DBID FFUUA
DEWEY 616.99400113
DOI 10.1007/978-3-031-17266-3_7
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Applied Sciences
Computer Science
EISBN 9783031172663
3031172663
EISSN 1611-3349
Editor Wu, Jia
Zhang, Fa
Zaki, Nazar
Qin, Wenjian
Yang, Fan
Editor_xml – sequence: 1
  fullname: Wu, Jia
– sequence: 2
  fullname: Zhang, Fa
– sequence: 3
  fullname: Qin, Wenjian
– sequence: 4
  fullname: Yang, Fan
– sequence: 5
  fullname: Zaki, Nazar
EndPage 77
ExternalDocumentID EBC7102179_75_79
GroupedDBID 38.
AABBV
AAZWU
ABSVR
ABTHU
ABVND
ACBPT
ACHZO
ACPMC
ADNVS
AEDXK
AEJLV
AEKFX
AHVRR
AIYYB
ALMA_UNASSIGNED_HOLDINGS
BBABE
CZZ
FFUUA
IEZ
SBO
TPJZQ
TSXQS
Z7R
Z7U
Z7X
Z81
Z82
Z83
Z84
Z87
Z88
-DT
-~X
29L
2HA
2HV
ACGFS
ADCXD
EJD
F5P
LAS
LDH
P2P
RSU
~02
ID FETCH-LOGICAL-p240t-716f7c72bb19ae2b8298e68e09090e891a558f8c43ec4e0628bda490234642d3
ISBN 9783031172656
3031172655
ISSN 0302-9743
IngestDate Tue Jul 29 20:15:29 EDT 2025
Tue Jul 22 07:50:38 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum TA1501-1820
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p240t-716f7c72bb19ae2b8298e68e09090e891a558f8c43ec4e0628bda490234642d3
Notes Original Abstract: Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.1\%$$\end{document} balanced accuracy) and all-vs-one cancer type detection (78.0%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$78.0\%$$\end{document} average maxF1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\max F_1$$\end{document}).
OCLC 1346154862
PQID EBC7102179_75_79
PageCount 10
ParticipantIDs springer_books_10_1007_978_3_031_17266_3_7
proquest_ebookcentralchapters_7102179_75_79
PublicationCentury 2000
PublicationDate 2022
20220922
PublicationDateYYYYMMDD 2022-01-01
2022-09-22
PublicationDate_xml – year: 2022
  text: 2022
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle First International Workshop, CMMCA 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings
PublicationTitle Computational Mathematics Modeling in Cancer Analysis
PublicationYear 2022
Publisher Springer
Springer Nature Switzerland
Publisher_xml – name: Springer
– name: Springer Nature Switzerland
RelatedPersons Hartmanis, Juris
Gao, Wen
Steffen, Bernhard
Bertino, Elisa
Goos, Gerhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Moti
  orcidid: 0000-0003-0848-0873
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002731070
ssj0002792
Score 2.0424252
Snippet Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task...
SourceID springer
proquest
SourceType Publisher
StartPage 68
SubjectTerms Cancer classification
Cancer detection
GTEx
Machine learning
Multinomial logistic regression
TCGA
Title CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7102179&ppg=79&c=UERG
http://link.springer.com/10.1007/978-3-031-17266-3_7
Volume 13574
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3La9swGBdrBqXssK0P1u6BDjt1qMSWJUu7LWlGFpqd0tGbkGQZAsUpTXbZX99Pkh-xt0uHwRjHUsT3E99L3wOhz9xym7ExJaCqapIJyYimVBNnOTc2AaGuvUN_-ZPPb7PFHbvr-mOG7JKdubJ__plX8j-owjvA1WfJPgPZdlJ4Ac-AL9wBYbgPlN--mzXWFQj9GBpf3rItv7oN_c3u61yVqUf1sa09sr9D4Kfrm5l3Cfzw7Y7X3i8IgyZrEGwhSjbIscBVNiGCXleknu46Buite06DNB04DRqnYc-YBGGWgDrDY6HvljtSFtvo_MVr98MrYCjxYzmhKu9ES3OcHvvFDApbzyZTr94AR1A5U7k8QAe5YCP08ttscfOrdZSBfgUm6tjn5TQLZLFyUrfgtpxUrBg8WE_PeBicdwc1YvUGvfKpJdjnfMAS36IXrjpGr2tDANdsdnuMDpd1xMMJmkSUvuJ9jHDECK8r3McIdxjhFqNTtPo-W03npO57QR5Av9oRMGHL3OapMYnULjUilcJx4cYSLidkohkTpbAZdTZzPgnWFDqToH1lYE0W9AyNqk3l3iFswBimhpWG2zIbl6VwiZM6FYUsXAH_cY6-NJRR4XC-jgi2kQ5b1QPoHF02xFP-461qal4D0RVVQHQViA7P-cWzpn6Pjrpt-gGNdo-_3UfQ9nbmU70fngCgik75
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computational+Mathematics+Modeling+in+Cancer+Analysis&rft.atitle=CanDLE%3A+Illuminating+Biases+in+Transcriptomic+Pan-Cancer+Diagnosis&rft.date=2022-01-01&rft.pub=Springer&rft.isbn=9783031172656&rft.volume=13574&rft_id=info:doi/10.1007%2F978-3-031-17266-3_7&rft.externalDBID=79&rft.externalDocID=EBC7102179_75_79
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7102179-l.jpg