Optimal Categorical Attribute Transformation for Granularity Change in Relational Databases for Binary Decision Problems in Educational Data Mining
This paper presents an approach for transforming data granularity in hierarchical databases for binary decision problems by applying regression to categorical attributes at the lower grain levels. Attributes from a lower hierarchy entity in the relational database have their information content opti...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
28.02.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents an approach for transforming data granularity in hierarchical databases for binary decision problems by applying regression to categorical attributes at the lower grain levels. Attributes from a lower hierarchy entity in the relational database have their information content optimized through regression on the categories histogram trained on a small exclusive labelled sample, instead of the usual mode category of the distribution. The paper validates the approach on a binary decision task for assessing the quality of secondary schools focusing on how logistic regression transforms the students and teachers attributes into school attributes. Experiments were carried out on Brazilian schools public datasets via 10-fold cross-validation comparison of the ranking score produced also by logistic regression. The proposed approach achieved higher performance than the usual distribution mode transformation and equal to the expert weighing approach measured by the maximum Kolmogorov-Smirnov distance and the area under the ROC curve at 0.01 significance level. |
---|---|
ISSN: | 2331-8422 |