Data Science Solutions with Python - Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn

Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distribute...

Full description

Saved in:
Bibliographic Details
Main Author Nokeri, Tshepo Chris
Format eBook
LanguageEnglish
Published Berkeley, CA Apress, an imprint of Springer Nature 2021
Apress
Apress L. P
Edition1
Subjects
Online AccessGet full text
ISBN9781484277614
1484277619
9781484277621
1484277627
DOI10.1007/978-1-4842-7762-1

Cover

Abstract Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered.
AbstractList Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras.The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked.This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will LearnUnderstand widespread supervised and unsupervised learning, including key dimension reduction techniquesKnow the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learningIntegrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworksDesign, build, test, and validate skilled machine models and deep learning modelsOptimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration Who This Book Is ForData scientists and machine learning engineers with basic knowledge and understanding of Python programming, probability theories, and predictive analytics
Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered.
Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked. This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will Learn * Understand widespread supervised and unsupervised learning, including key dimension reduction techniques * Know the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learning * Integrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworks * Design, build, test, and validate skilled machine models and deep learning models * Optimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration Who This Book Is For Data scientists and machine learning engineers with basic knowledge and understanding of Python programming, probability theories, and predictive analytics
Author Tshepo Chris Nokeri
Author_xml – sequence: 1
  fullname: Nokeri, Tshepo Chris
BookMark eNpl0E9v0zAYBvAg_gg2-gGQOPgAQkgN82s7cXxk3bohWm1SAO1mOcnbNcTYxc469u1J1hyQOPnw_J7X9nuUPHPeYZK8AfoJKJUnShYppKIQLJUyZyk8SY4gYxwyqoqbp8lsADDGQwrixRCygheZ4Eq9TGYx_qSUMskABHuV_DkzvSFl3aKrkZTe3vWtd5Hct_2WXD_0W-9ISpYm9sS4ZoDGmsoiWfsGbSTfY-tuyVcMJs4HXu5M6Mh6ZdtqTi7Z1ZzcXJx6H_v51G67tk9XaIJ7nTzfGBtxNp3HyY_l-bfFZbq6uviy-LxKDWM5U6mBhnOsMoCiFpuqprJRG1ZjowBzrkTdZFxVmGdoRNNApRRQ4DJvKiUrpjJ-nHw8DDaxw_u49baPem-x8r6L-p9dMRjsycHGXRg-hkEfFFA9rn7UGvTo9VjQY-Pd1DAbE9rJ79l_gz8c2C7433cYe_14f42uD8bq89NFLhUTdHzu20lisHjrp4kiA57LfIjfH-LO-T1aPbzzlwkPj0p3u7OyvF6Wa-B_AaBUogk
ContentType eBook
Copyright 2022
Tshepo Chris Nokeri 2022
Copyright_xml – notice: 2022
– notice: Tshepo Chris Nokeri 2022
DBID YSPEL
OHILO
OODEK
DEWEY 006.31
DOI 10.1007/978-1-4842-7762-1
DatabaseName Perlego
O'Reilly Online Learning: Corporate Edition
O'Reilly Online Learning: Academic/Public Library Edition
DatabaseTitleList



DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Statistics
EISBN 152315098X
9781523150984
9781484277621
1484277627
Edition 1
ExternalDocumentID 9781484277621
517508
EBC6792405
4513676
book_kpDSSPFSM1
Genre Electronic books
GroupedDBID 38.
AABBV
AADCS
AAJNZ
AALIM
AAQFW
AAZWU
ABSVR
ABTHU
ACPMC
ACXXF
ADNVS
AEKFX
AIYYB
ALMA_UNASSIGNED_HOLDINGS
BBABE
CMZ
CZZ
IEZ
K-E
OHILO
OODEK
SBO
TD3
TPJZQ
WZT
YSPEL
AJIEK
ACBYE
ID FETCH-LOGICAL-a22629-a1d33eb5118c4fbc07d9f2ced91e6394cd539be65ea4dd1b99101376db97b2953
IEDL.DBID CMZ
ISBN 9781484277614
1484277619
9781484277621
1484277627
IngestDate Thu Aug 21 09:44:15 EDT 2025
Tue Jul 29 20:28:26 EDT 2025
Sat Sep 06 01:59:03 EDT 2025
Fri May 30 22:53:48 EDT 2025
Wed Sep 03 00:13:14 EDT 2025
Sat Nov 23 14:06:46 EST 2024
IsPeerReviewed false
IsScholarly false
LCCallNum_Ident QA76.9.Q36
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a22629-a1d33eb5118c4fbc07d9f2ced91e6394cd539be65ea4dd1b99101376db97b2953
OCLC 1283854399
PQID EBC6792405
PageCount 128
ParticipantIDs askewsholts_vlebooks_9781484277621
springer_books_10_1007_978_1_4842_7762_1
safari_books_v2_9781484277621
proquest_ebookcentral_EBC6792405
perlego_books_4513676
knovel_primary_book_kpDSSPFSM1
PublicationCentury 2000
PublicationDate 2022
2021
2021-10-25T00:00:00
20211026
2021-10-25
PublicationDateYYYYMMDD 2022-01-01
2021-01-01
2021-10-25
2021-10-26
PublicationDate_xml – year: 2021
  text: 2021
PublicationDecade 2020
PublicationPlace Berkeley, CA
PublicationPlace_xml – name: Berkeley, CA
PublicationYear 2022
2021
Publisher Apress, an imprint of Springer Nature
Apress
Apress L. P
Publisher_xml – name: Apress, an imprint of Springer Nature
– name: Apress
– name: Apress L. P
SSID ssj0002721142
Score 2.2797024
Snippet Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize...
SourceID askewsholts
springer
safari
proquest
perlego
knovel
SourceType Aggregation Database
Publisher
SubjectTerms Artificial Intelligence
COMPUTERS
General References
Machine Learning
Professional and Applied Computing
Python
Python (Computer program language)
Software Engineering
Statistics
Statistics, general
TableOfContents Title Page Introduction Table of Contents 1. Exploring Machine Learning 2. Big Data, Machine Learning, and Deep Learning Frameworks 3. Linear Modeling with Scikit-Learn, PySpark, and H2O 4. Survival Analysis with PySpark and Lifelines 5. Nonlinear Modeling with Scikit-Learn, PySpark, and H2O 6. Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O 7. Neural Networks with Scikit-Learn, Keras, and H2O 8. Cluster Analysis with Scikit-Learn, PySpark, and H2O 9. Principal Component Analysis with Scikit-Learn, PySpark, and H2O 10. Automating the Machine Learning Process with H2O Index
Exploring Deep Learning -- Multilayer Perceptron Neural Network -- Preprocessing Features -- Scikit-Learn in Action -- Keras in Action -- Deep Belief Networks -- H2O in Action -- Conclusion -- Chapter 8: Cluster Analysis with Scikit-Learn, PySpark, and H2O -- Exploring the K-Means Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 9: Principal Component Analysis with Scikit-Learn, PySpark, and H2O -- Exploring the Principal Component Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 10: Automating the Machine Learning Process with H2O -- Exploring Automated Machine Learning -- Preprocessing Features -- H2O AutoML in Action -- Conclusion -- Index
Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Chapter 1: Exploring Machine Learning -- Exploring Supervised Methods -- Exploring Nonlinear Models -- Exploring Ensemble Methods -- Exploring Unsupervised Methods -- Exploring Cluster Methods -- Exploring Dimension Reduction -- Exploring Deep Learning -- Conclusion -- Chapter 2: Big Data, Machine Learning, and Deep Learning Frameworks -- Big Data -- Big Data Features -- Impact of Big Data on Business and People -- Better Customer Relationships -- Refined Product Development -- Improved Decision-Making -- Big Data Warehousing -- Big Data ETL -- Big Data Frameworks -- Apache Spark -- Resilient Distributed Data Sets -- Spark Configuration -- Spark Frameworks -- SparkSQL -- Spark Streaming -- Spark MLlib -- GraphX -- ML Frameworks -- Scikit-Learn -- H2O -- XGBoost -- DL Frameworks -- Keras -- Chapter 3: Linear Modeling with Scikit-Learn, PySpark, and H2O -- Exploring the Ordinary Least-Squares Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 4: Survival Analysis with PySpark and Lifelines -- Exploring Survival Analysis -- Exploring Cox Proportional Hazards Method -- Lifeline in Action -- Exploring the Accelerated Failure Time Method -- PySpark in Action -- Conclusion -- Chapter 5: Nonlinear Modeling With Scikit-Learn, PySpark, and H2O -- Exploring the Logistic Regression Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 6: Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O -- Decision Trees -- Preprocessing Features -- Scikit-Learn in Action -- Gradient Boosting -- XGBoost in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 7: Neural Networks with Scikit-Learn, Keras, and H2O
Title Data Science Solutions with Python - Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
URI https://app.knovel.com/hotlink/toc/id:kpDSSPFSM1/data-science-solutions/data-science-solutions?kpromoter=Summon
https://www.perlego.com/book/4513676/data-science-solutions-with-python-fast-and-scalable-models-using-keras-pyspark-mllib-h2o-xgboost-and-scikitlearn-pdf
https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=6792405
https://learning.oreilly.com/library/view/~/9781484277621/?ar
http://link.springer.com/10.1007/978-1-4842-7762-1
https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781484277621
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwvV1Lb9QwELaqcqGX8ihiC60sxIHDurt2nKTmgkS7y4qyUGkBrbhYju3AKlESbdIV8Cv4yYydZClFiBuXSEnGeXky_r6Z8RihpzRMY0upJnHAOeFGgB00NCWJSG0UA5VLfTLm_G00-8BfL8PlDir7uTBucausKDc292b6S9m4QOaoKfVoZZ5n1flicTldzOnIJVCSbpQg2276y-EXWeWT20A5WueSs9kwOLrg7vzT1ifDHB3y7B5IAmexo_h9Wahun_eR0a44LRAwOAMYNWKE7qE9VWdgnMBwNTWMae2bALSu7Dq3n8vfYWytUuDDf4Rg_cg23Uc_-m_SJrRkJ1dNcqK_3ygX-R8_2h10y7oZGHfRji3uof1-3QncmaH76Os5XKnfxVvHHnY-ZXz5zVVDwARPVd1gVRgQVLmbIIbdum95jX2WBL6wa1UPQXxRqXWG52_yVTLEM_ZuiJev4P51M-xar7JVQ3wd2gP0cTp5fzYj3QoSRAGsZIIoaoLAJo5GaZ4mehwbkTJtjaAWsBnXJgxEYqPQKm4MTQAtuxqMkUlEnDARBg_QblEW9iHCdswEYEUdqCDmWml1KgBNxVEcaaOEPh2gJ9c6X25yH-2u5TXtYXSAjtuOklVbTEQ6IfmriwbooNMV2TbnoS-qN0C41xzpL9zl7srJy7MoBhI9DgfoqNWoruWG3bz3s17ROom-ZDWISSqdoHSSkh7-6zEfodvMzQPxvqjHaLdZX9kjQGdNcux_KdhekMlPlEU6pg
linkProvider Knovel
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Data+Science+Solutions+with+Python&rft.au=Tshepo+Chris+Nokeri&rft.date=2021-01-01&rft.pub=Apress&rft.isbn=9781484277621&rft_id=info:doi/10.1007%2F978-1-4842-7762-1&rft.externalDBID=YSPEL&rft.externalDocID=4513676
thumbnail_l http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.perlego.com%2Fbooks%2FRM_Books%2Fingram_csplus_gexhsuob%2F9781484277621.jpg
thumbnail_m http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.safaribooksonline.com%2Flibrary%2Fcover%2F9781484277621
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97814842%2F9781484277621.jpg
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fcontent.knovel.com%2Fcontent%2FThumbs%2Fthumb14877.gif
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-1-4842-7762-1