Method for detecting outlier data from large-scale high dimensional data based on genetic algorithm

The invention discloses a method for detecting outlier data from large-scale high dimensional data based on a genetic algorithm, and belongs to the technical field of outlier data mining. The method comprises the steps of (1) sample discretization and encoding, namely encoding the high dimensional d...

Full description

Saved in:
Bibliographic Details
Main Authors FU XINGWANG, WU NAN, WEI PENG
Format Patent
LanguageEnglish
Published 11.03.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a method for detecting outlier data from large-scale high dimensional data based on a genetic algorithm, and belongs to the technical field of outlier data mining. The method comprises the steps of (1) sample discretization and encoding, namely encoding the high dimensional data and enabling each individual to correspond to one character string, selecting a sparse coefficient as a fitness function and taking the coefficient as a criterion for judging whether the individuals are good or bad, (2) loop iteration, namely maintaining a group which comprises a plurality of individuals and updating the group continuously by use of crossing, mutation and selection according to the principle of survival of the fittest, and (3) decoding to obtain the outlier data, namely decoding the group obtained at last by corresponding to the corresponding sample data and then finding the hidden outlier data in the sample data. The method for detecting the outlier data from the large-scale high dimensional data based on the genetic algorithm is capable of effectively and quickly finding out the hidden outlier data from the large-scale high dimensional data.
Bibliography:Application Number: CN201410689745