Decoding Lung Cancer Radiogenomics: A Custom Clustering/Classification Methodology to Simultaneously Identify Important Imaging Features and Relevant Genes
Background: This study evaluated a custom algorithm that sought to perform a radiogenomic analysis on lung cancer genetic and imaging data, specifically by using machine learning to see whether a custom clustering/classification method could simultaneously identify features from imaging data that co...
Saved in:
Published in | Applied sciences Vol. 15; no. 7; p. 4053 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Background: This study evaluated a custom algorithm that sought to perform a radiogenomic analysis on lung cancer genetic and imaging data, specifically by using machine learning to see whether a custom clustering/classification method could simultaneously identify features from imaging data that correspond to genetic markers. Methods: CT imaging data and genetic mutation data for 281 subjects with NSCLC were collected from the CPTAC-LUAD and TCGA-LUSC databases on TCIA. The algorithm was run as follows: (1) genetic clusters were initialized using random clusters, binary matrix factorization, or k-means; (2) image classification was run on CT data for these genetic clusters; (3) misclassified subjects were re-classified based on the image classification algorithm; and (4) the algorithm was run until an accuracy of 90% or no improvement after 10 runs. Input genetic mutations were evaluated for potential medical treatments and severity to provide clinical relevance. Results: The image classification algorithm was able to achieve a >90% accuracy after nine algorithm runs and grouped subjects from a starting five clusters to four final clusters, where final image classification accuracy was better than every initial clustered accuracy. These clusters were stable across all three test runs. A total of thirty-eight genes from the top hundred across each subject were identified with specific severity or treatment data; twelve of these genes are listed. Conclusion: This small pilot study presented a potential way to identify genetic patterns from image data and presented a methodology that could group images with no labels or only partial labels for future problems. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2076-3417 2076-3417 |
DOI: | 10.3390/app15074053 |