An Efficient Boruta-Based Feature Selection and Classification of Gene Expression Data

Gene expression data is biological data on the quantities of various transcription factors and other chemicals inside a cell at any particular time. It comes from a study of DNA microarrays. The amount of many chemical components' approaches shown by gene expression data reveals a range of fact...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT) pp. 1 - 6
Main Authors Kavitha, K R, Sajith, Sreelakshmi, Variar, Namitha H
Format Conference Proceeding
LanguageEnglish
Published IEEE 07.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Gene expression data is biological data on the quantities of various transcription factors and other chemicals inside a cell at any particular time. It comes from a study of DNA microarrays. The amount of many chemical components' approaches shown by gene expression data reveals a range of facts about the cell's health. The difficulty with gene expression data is that it contains noise, missing values, and has an extremely high dimensionality since each gene in an organism's genome has a value in the thousands, despite the fact that the number of samples is considerably fewer. This leads to mistakes in the computational analysis due to the curse of dimensionality. We have utilised the feature selection approach to fix these issues. It is used to choose the most appropriate genes for the subject being studied from the large number of genes whose values are provided. Our idea is to use the Boruta feature selection algorithm, a random forest wrapper class approach, to select a collection of features from many samples produced by gene expression profiles.
ISBN:9781665468534
166546853X
DOI:10.1109/GCAT55367.2022.9971894