Data Science Solutions with Python - Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn

Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distribute...

Full description

Saved in:

Bibliographic Details
Main Author	Nokeri, Tshepo Chris
Format	eBook
Language	English
Published	Berkeley, CA Apress, an imprint of Springer Nature 2021 Apress Apress L. P
Edition	1
Subjects	Artificial Intelligence COMPUTERS General References Machine Learning Professional and Applied Computing Python Python (Computer program language) Software Engineering Statistics Statistics, general
Online Access	Get full text
ISBN	9781484277614 1484277619 9781484277621 1484277627
DOI	10.1007/978-1-4842-7762-1

Cover

Abstract	Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered.
AbstractList	Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras.The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked.This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will LearnUnderstand widespread supervised and unsupervised learning, including key dimension reduction techniquesKnow the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learningIntegrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworksDesign, build, test, and validate skilled machine models and deep learning modelsOptimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration Who This Book Is ForData scientists and machine learning engineers with basic knowledge and understanding of Python programming, probability theories, and predictive analytics Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked. This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will Learn * Understand widespread supervised and unsupervised learning, including key dimension reduction techniques * Know the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learning * Integrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworks * Design, build, test, and validate skilled machine models and deep learning models * Optimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration Who This Book Is For Data scientists and machine learning engineers with basic knowledge and understanding of Python programming, probability theories, and predictive analytics
Author	Tshepo Chris Nokeri
Author_xml	– sequence: 1 fullname: Nokeri, Tshepo Chris
BookMark	eNpl0E9v0zAYBvAg_gg2-gGQOPgAQkgN82s7cXxk3bohWm1SAO1mOcnbNcTYxc469u1J1hyQOPnw_J7X9nuUPHPeYZK8AfoJKJUnShYppKIQLJUyZyk8SY4gYxwyqoqbp8lsADDGQwrixRCygheZ4Eq9TGYx_qSUMskABHuV_DkzvSFl3aKrkZTe3vWtd5Hct_2WXD_0W-9ISpYm9sS4ZoDGmsoiWfsGbSTfY-tuyVcMJs4HXu5M6Mh6ZdtqTi7Z1ZzcXJx6H_v51G67tk9XaIJ7nTzfGBtxNp3HyY_l-bfFZbq6uviy-LxKDWM5U6mBhnOsMoCiFpuqprJRG1ZjowBzrkTdZFxVmGdoRNNApRRQ4DJvKiUrpjJ-nHw8DDaxw_u49baPem-x8r6L-p9dMRjsycHGXRg-hkEfFFA9rn7UGvTo9VjQY-Pd1DAbE9rJ79l_gz8c2C7433cYe_14f42uD8bq89NFLhUTdHzu20lisHjrp4kiA57LfIjfH-LO-T1aPbzzlwkPj0p3u7OyvF6Wa-B_AaBUogk
ContentType	eBook
Copyright	2022 Tshepo Chris Nokeri 2022
Copyright_xml	– notice: 2022 – notice: Tshepo Chris Nokeri 2022
DBID	YSPEL OHILO OODEK
DEWEY	006.31
DOI	10.1007/978-1-4842-7762-1
DatabaseName	Perlego O'Reilly Online Learning: Corporate Edition O'Reilly Online Learning: Academic/Public Library Edition
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science Statistics
EISBN	152315098X 9781523150984 9781484277621 1484277627
Edition	1
ExternalDocumentID	9781484277621 517508 EBC6792405 4513676 book_kpDSSPFSM1
Genre	Electronic books
GroupedDBID	38. AABBV AADCS AAJNZ AALIM AAQFW AAZWU ABSVR ABTHU ACPMC ACXXF ADNVS AEKFX AIYYB ALMA_UNASSIGNED_HOLDINGS BBABE CMZ CZZ IEZ K-E OHILO OODEK SBO TD3 TPJZQ WZT YSPEL AJIEK ACBYE
ID	FETCH-LOGICAL-a22629-a1d33eb5118c4fbc07d9f2ced91e6394cd539be65ea4dd1b99101376db97b2953
IEDL.DBID	CMZ
ISBN	9781484277614 1484277619 9781484277621 1484277627
IngestDate	Thu Aug 21 09:44:15 EDT 2025 Tue Jul 29 20:28:26 EDT 2025 Sat Sep 06 01:59:03 EDT 2025 Fri May 30 22:53:48 EDT 2025 Wed Sep 03 00:13:14 EDT 2025 Sat Nov 23 14:06:46 EST 2024
IsPeerReviewed	false
IsScholarly	false
LCCallNum_Ident	QA76.9.Q36
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a22629-a1d33eb5118c4fbc07d9f2ced91e6394cd539be65ea4dd1b99101376db97b2953
OCLC	1283854399
PQID	EBC6792405
PageCount	128
ParticipantIDs	askewsholts_vlebooks_9781484277621 springer_books_10_1007_978_1_4842_7762_1 safari_books_v2_9781484277621 proquest_ebookcentral_EBC6792405 perlego_books_4513676 knovel_primary_book_kpDSSPFSM1
PublicationCentury	2000
PublicationDate	2022 2021 2021-10-25T00:00:00 20211026 2021-10-25
PublicationDateYYYYMMDD	2022-01-01 2021-01-01 2021-10-25 2021-10-26
PublicationDate_xml	– year: 2021 text: 2021
PublicationDecade	2020
PublicationPlace	Berkeley, CA
PublicationPlace_xml	– name: Berkeley, CA
PublicationYear	2022 2021
Publisher	Apress, an imprint of Springer Nature Apress Apress L. P
Publisher_xml	– name: Apress, an imprint of Springer Nature – name: Apress – name: Apress L. P
SSID	ssj0002721142
Score	2.2797024
Snippet	Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize...
SourceID	askewsholts springer safari proquest perlego knovel
SourceType	Aggregation Database Publisher
SubjectTerms	Artificial Intelligence COMPUTERS General References Machine Learning Professional and Applied Computing Python Python (Computer program language) Software Engineering Statistics Statistics, general
TableOfContents	Title Page Introduction Table of Contents 1. Exploring Machine Learning 2. Big Data, Machine Learning, and Deep Learning Frameworks 3. Linear Modeling with Scikit-Learn, PySpark, and H2O 4. Survival Analysis with PySpark and Lifelines 5. Nonlinear Modeling with Scikit-Learn, PySpark, and H2O 6. Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O 7. Neural Networks with Scikit-Learn, Keras, and H2O 8. Cluster Analysis with Scikit-Learn, PySpark, and H2O 9. Principal Component Analysis with Scikit-Learn, PySpark, and H2O 10. Automating the Machine Learning Process with H2O Index Exploring Deep Learning -- Multilayer Perceptron Neural Network -- Preprocessing Features -- Scikit-Learn in Action -- Keras in Action -- Deep Belief Networks -- H2O in Action -- Conclusion -- Chapter 8: Cluster Analysis with Scikit-Learn, PySpark, and H2O -- Exploring the K-Means Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 9: Principal Component Analysis with Scikit-Learn, PySpark, and H2O -- Exploring the Principal Component Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 10: Automating the Machine Learning Process with H2O -- Exploring Automated Machine Learning -- Preprocessing Features -- H2O AutoML in Action -- Conclusion -- Index Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Chapter 1: Exploring Machine Learning -- Exploring Supervised Methods -- Exploring Nonlinear Models -- Exploring Ensemble Methods -- Exploring Unsupervised Methods -- Exploring Cluster Methods -- Exploring Dimension Reduction -- Exploring Deep Learning -- Conclusion -- Chapter 2: Big Data, Machine Learning, and Deep Learning Frameworks -- Big Data -- Big Data Features -- Impact of Big Data on Business and People -- Better Customer Relationships -- Refined Product Development -- Improved Decision-Making -- Big Data Warehousing -- Big Data ETL -- Big Data Frameworks -- Apache Spark -- Resilient Distributed Data Sets -- Spark Configuration -- Spark Frameworks -- SparkSQL -- Spark Streaming -- Spark MLlib -- GraphX -- ML Frameworks -- Scikit-Learn -- H2O -- XGBoost -- DL Frameworks -- Keras -- Chapter 3: Linear Modeling with Scikit-Learn, PySpark, and H2O -- Exploring the Ordinary Least-Squares Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 4: Survival Analysis with PySpark and Lifelines -- Exploring Survival Analysis -- Exploring Cox Proportional Hazards Method -- Lifeline in Action -- Exploring the Accelerated Failure Time Method -- PySpark in Action -- Conclusion -- Chapter 5: Nonlinear Modeling With Scikit-Learn, PySpark, and H2O -- Exploring the Logistic Regression Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 6: Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O -- Decision Trees -- Preprocessing Features -- Scikit-Learn in Action -- Gradient Boosting -- XGBoost in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 7: Neural Networks with Scikit-Learn, Keras, and H2O
Title	Data Science Solutions with Python - Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
URI	https://app.knovel.com/hotlink/toc/id:kpDSSPFSM1/data-science-solutions/data-science-solutions?kpromoter=Summon https://www.perlego.com/book/4513676/data-science-solutions-with-python-fast-and-scalable-models-using-keras-pyspark-mllib-h2o-xgboost-and-scikitlearn-pdf https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=6792405 https://learning.oreilly.com/library/view/~/9781484277621/?ar http://link.springer.com/10.1007/978-1-4842-7762-1 https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781484277621
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwvV1Lb9QwELaqcqGX8ihiC60sxIHDurt2nKTmgkS7y4qyUGkBrbhYju3AKlESbdIV8Cv4yYydZClFiBuXSEnGeXky_r6Z8RihpzRMY0upJnHAOeFGgB00NCWJSG0UA5VLfTLm_G00-8BfL8PlDir7uTBucausKDc292b6S9m4QOaoKfVoZZ5n1flicTldzOnIJVCSbpQg2276y-EXWeWT20A5WueSs9kwOLrg7vzT1ifDHB3y7B5IAmexo_h9Wahun_eR0a44LRAwOAMYNWKE7qE9VWdgnMBwNTWMae2bALSu7Dq3n8vfYWytUuDDf4Rg_cg23Uc_-m_SJrRkJ1dNcqK_3ygX-R8_2h10y7oZGHfRji3uof1-3QncmaH76Os5XKnfxVvHHnY-ZXz5zVVDwARPVd1gVRgQVLmbIIbdum95jX2WBL6wa1UPQXxRqXWG52_yVTLEM_ZuiJev4P51M-xar7JVQ3wd2gP0cTp5fzYj3QoSRAGsZIIoaoLAJo5GaZ4mehwbkTJtjaAWsBnXJgxEYqPQKm4MTQAtuxqMkUlEnDARBg_QblEW9iHCdswEYEUdqCDmWml1KgBNxVEcaaOEPh2gJ9c6X25yH-2u5TXtYXSAjtuOklVbTEQ6IfmriwbooNMV2TbnoS-qN0C41xzpL9zl7srJy7MoBhI9DgfoqNWoruWG3bz3s17ROom-ZDWISSqdoHSSkh7-6zEfodvMzQPxvqjHaLdZX9kjQGdNcux_KdhekMlPlEU6pg
linkProvider	Knovel
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Data+Science+Solutions+with+Python&rft.au=Tshepo+Chris+Nokeri&rft.date=2021-01-01&rft.pub=Apress&rft.isbn=9781484277621&rft_id=info:doi/10.1007%2F978-1-4842-7762-1&rft.externalDBID=YSPEL&rft.externalDocID=4513676
thumbnail_l	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.perlego.com%2Fbooks%2FRM_Books%2Fingram_csplus_gexhsuob%2F9781484277621.jpg
thumbnail_m	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.safaribooksonline.com%2Flibrary%2Fcover%2F9781484277621 http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97814842%2F9781484277621.jpg
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fcontent.knovel.com%2Fcontent%2FThumbs%2Fthumb14877.gif http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-1-4842-7762-1