AUTOMATIC FEATURE SUBSET SELECTION USING FEATURE RANKING AND SCALABLE AUTOMATIC SEARCH
The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfittin...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
16.04.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer calculates, for each feature of a training dataset, a relevance score based on: a relevance scoring function, and statistics of values, of the feature, that occur in the training dataset. A rank based on relevance scores of the features is calculated for each feature. A sequence of distinct subsets of the features, based on the ranks of the features, is generated. For each distinct subset of the sequence of distinct feature subsets, a fitness score is generated based on training a machine learning (ML) model that is configured for the distinct subset. |
---|---|
Bibliography: | Application Number: US201916417145 |