COMPARATIVE ANALYSIS OF CLUSTER CONCENTRIC CIRCLE BASED UNDER SAMPLING OVER LOW VERSUS HIGH DIMENSIONAL IMBALANCED DATASETS

An imbalanced dataset influences the supervised learning model. Most of the existing real world datasets are imbalanced and often high dimensional. The existing classification methods tend to perform extremely well on the majority class and give least importance to the minority class. Most of the so...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of advanced research in computer science Vol. 8; no. 8; pp. 433 - 437
Main Author Srividhya, S.
Format Journal Article
LanguageEnglish
Published Udaipur International Journal of Advanced Research in Computer Science 01.09.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:An imbalanced dataset influences the supervised learning model. Most of the existing real world datasets are imbalanced and often high dimensional. The existing classification methods tend to perform extremely well on the majority class and give least importance to the minority class. Most of the solutions provided for the imbalanced datasets do not fit in for the high dimensional imbalanced datasets. This paper compares the performance of an existing balancing method (cluster concentric circle based under samplingC3BUS) over low dimensional imbalanced dataset versus high dimensional imbalanced datasets. This work shows that C3BUS works quiet well for low dimensional imbalanced dataset when compared to high dimensional imbalanced dataset and proves that class imbalance and high dimensionality are one of the two main issues in supervised learning process.
ISSN:0976-5697
0976-5697
DOI:10.26483/ijarcs.v8i8.4783