Data Science Solutions with Python - Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn

Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distribute...

Full description

Saved in:
Bibliographic Details
Main Author Nokeri, Tshepo Chris
Format eBook
LanguageEnglish
Published Berkeley, CA Apress, an imprint of Springer Nature 2021
Apress
Apress L. P
Edition1
Subjects
Online AccessGet full text
ISBN9781484277614
1484277619
9781484277621
1484277627
DOI10.1007/978-1-4842-7762-1

Cover

Loading…
More Information
Summary:Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered.
ISBN:9781484277614
1484277619
9781484277621
1484277627
DOI:10.1007/978-1-4842-7762-1