Decoding Lung Cancer Radiogenomics: A Custom Clustering/Classification Methodology to Simultaneously Identify Important Imaging Features and Relevant Genes

Background: This study evaluated a custom algorithm that sought to perform a radiogenomic analysis on lung cancer genetic and imaging data, specifically by using machine learning to see whether a custom clustering/classification method could simultaneously identify features from imaging data that co...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 15; no. 7; p. 4053
Main Authors Provenzano, Destie, Lichtenberger, John P., Goyal, Sharad, Rao, Yuan James
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.04.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Background: This study evaluated a custom algorithm that sought to perform a radiogenomic analysis on lung cancer genetic and imaging data, specifically by using machine learning to see whether a custom clustering/classification method could simultaneously identify features from imaging data that correspond to genetic markers. Methods: CT imaging data and genetic mutation data for 281 subjects with NSCLC were collected from the CPTAC-LUAD and TCGA-LUSC databases on TCIA. The algorithm was run as follows: (1) genetic clusters were initialized using random clusters, binary matrix factorization, or k-means; (2) image classification was run on CT data for these genetic clusters; (3) misclassified subjects were re-classified based on the image classification algorithm; and (4) the algorithm was run until an accuracy of 90% or no improvement after 10 runs. Input genetic mutations were evaluated for potential medical treatments and severity to provide clinical relevance. Results: The image classification algorithm was able to achieve a >90% accuracy after nine algorithm runs and grouped subjects from a starting five clusters to four final clusters, where final image classification accuracy was better than every initial clustered accuracy. These clusters were stable across all three test runs. A total of thirty-eight genes from the top hundred across each subject were identified with specific severity or treatment data; twelve of these genes are listed. Conclusion: This small pilot study presented a potential way to identify genetic patterns from image data and presented a methodology that could group images with no labels or only partial labels for future problems.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2076-3417
2076-3417
DOI:10.3390/app15074053