Enabling advanced analytics with large data sets

The present disclosure describes methods, systems, and computer program products for enabling advanced analytics with large datasets. One computer-implemented method includes receiving, by operation of a computer system, a dataset of multiple data records, each of the plurality of data records compr...

Full description

Saved in:
Bibliographic Details
Main Authors Pallath, Paul, Razavi, Rouzbeh
Format Patent
LanguageEnglish
Published 24.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The present disclosure describes methods, systems, and computer program products for enabling advanced analytics with large datasets. One computer-implemented method includes receiving, by operation of a computer system, a dataset of multiple data records, each of the plurality of data records comprising one or more features and a target variable; selecting key features among the one or more features based at least on relevance measures of the one or more features with respect to the target variable; dividing the dataset into multiple subsets; for each of the multiple subsets, identifying a number of clusters and respective centroids of the number of clusters based on the key features; identifying a number of final centroids based on the respective centroids of the number of clusters for the each of the number of subsets, the number of final centroids being respective centroids of a number of final clusters; and for each data record in the multiple subsets, assigning the data record to one of the number of final clusters based on distances between the data record and the number of final centroids.
Bibliography:Application Number: US201916256519