Data Science Solutions with Python - Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distribute...
Saved in:
Main Author | |
---|---|
Format | eBook |
Language | English |
Published |
Berkeley, CA
Apress, an imprint of Springer Nature
2021
Apress Apress L. P |
Edition | 1 |
Subjects | |
Online Access | Get full text |
ISBN | 9781484277614 1484277619 9781484277621 1484277627 |
DOI | 10.1007/978-1-4842-7762-1 |
Cover
Abstract | Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. |
---|---|
AbstractList | Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras.The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked.This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will LearnUnderstand widespread supervised and unsupervised learning, including key dimension reduction techniquesKnow the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learningIntegrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworksDesign, build, test, and validate skilled machine models and deep learning modelsOptimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration Who This Book Is ForData scientists and machine learning engineers with basic knowledge and understanding of Python programming, probability theories, and predictive analytics Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked. This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will Learn * Understand widespread supervised and unsupervised learning, including key dimension reduction techniques * Know the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learning * Integrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworks * Design, build, test, and validate skilled machine models and deep learning models * Optimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration Who This Book Is For Data scientists and machine learning engineers with basic knowledge and understanding of Python programming, probability theories, and predictive analytics |
Author | Tshepo Chris Nokeri |
Author_xml | – sequence: 1 fullname: Nokeri, Tshepo Chris |
BookMark | eNpl0E9v0zAYBvAg_gg2-gGQOPgAQkgN82s7cXxk3bohWm1SAO1mOcnbNcTYxc469u1J1hyQOPnw_J7X9nuUPHPeYZK8AfoJKJUnShYppKIQLJUyZyk8SY4gYxwyqoqbp8lsADDGQwrixRCygheZ4Eq9TGYx_qSUMskABHuV_DkzvSFl3aKrkZTe3vWtd5Hct_2WXD_0W-9ISpYm9sS4ZoDGmsoiWfsGbSTfY-tuyVcMJs4HXu5M6Mh6ZdtqTi7Z1ZzcXJx6H_v51G67tk9XaIJ7nTzfGBtxNp3HyY_l-bfFZbq6uviy-LxKDWM5U6mBhnOsMoCiFpuqprJRG1ZjowBzrkTdZFxVmGdoRNNApRRQ4DJvKiUrpjJ-nHw8DDaxw_u49baPem-x8r6L-p9dMRjsycHGXRg-hkEfFFA9rn7UGvTo9VjQY-Pd1DAbE9rJ79l_gz8c2C7433cYe_14f42uD8bq89NFLhUTdHzu20lisHjrp4kiA57LfIjfH-LO-T1aPbzzlwkPj0p3u7OyvF6Wa-B_AaBUogk |
ContentType | eBook |
Copyright | 2022 Tshepo Chris Nokeri 2022 |
Copyright_xml | – notice: 2022 – notice: Tshepo Chris Nokeri 2022 |
DBID | YSPEL OHILO OODEK |
DEWEY | 006.31 |
DOI | 10.1007/978-1-4842-7762-1 |
DatabaseName | Perlego O'Reilly Online Learning: Corporate Edition O'Reilly Online Learning: Academic/Public Library Edition |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science Statistics |
EISBN | 152315098X 9781523150984 9781484277621 1484277627 |
Edition | 1 |
ExternalDocumentID | 9781484277621 517508 EBC6792405 4513676 book_kpDSSPFSM1 |
Genre | Electronic books |
GroupedDBID | 38. AABBV AADCS AAJNZ AALIM AAQFW AAZWU ABSVR ABTHU ACPMC ACXXF ADNVS AEKFX AIYYB ALMA_UNASSIGNED_HOLDINGS BBABE CMZ CZZ IEZ K-E OHILO OODEK SBO TD3 TPJZQ WZT YSPEL AJIEK ACBYE |
ID | FETCH-LOGICAL-a22629-a1d33eb5118c4fbc07d9f2ced91e6394cd539be65ea4dd1b99101376db97b2953 |
IEDL.DBID | CMZ |
ISBN | 9781484277614 1484277619 9781484277621 1484277627 |
IngestDate | Thu Aug 21 09:44:15 EDT 2025 Tue Jul 29 20:28:26 EDT 2025 Sat Sep 06 01:59:03 EDT 2025 Fri May 30 22:53:48 EDT 2025 Wed Sep 03 00:13:14 EDT 2025 Sat Nov 23 14:06:46 EST 2024 |
IsPeerReviewed | false |
IsScholarly | false |
LCCallNum_Ident | QA76.9.Q36 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a22629-a1d33eb5118c4fbc07d9f2ced91e6394cd539be65ea4dd1b99101376db97b2953 |
OCLC | 1283854399 |
PQID | EBC6792405 |
PageCount | 128 |
ParticipantIDs | askewsholts_vlebooks_9781484277621 springer_books_10_1007_978_1_4842_7762_1 safari_books_v2_9781484277621 proquest_ebookcentral_EBC6792405 perlego_books_4513676 knovel_primary_book_kpDSSPFSM1 |
PublicationCentury | 2000 |
PublicationDate | 2022 2021 2021-10-25T00:00:00 20211026 2021-10-25 |
PublicationDateYYYYMMDD | 2022-01-01 2021-01-01 2021-10-25 2021-10-26 |
PublicationDate_xml | – year: 2021 text: 2021 |
PublicationDecade | 2020 |
PublicationPlace | Berkeley, CA |
PublicationPlace_xml | – name: Berkeley, CA |
PublicationYear | 2022 2021 |
Publisher | Apress, an imprint of Springer Nature Apress Apress L. P |
Publisher_xml | – name: Apress, an imprint of Springer Nature – name: Apress – name: Apress L. P |
SSID | ssj0002721142 |
Score | 2.2797024 |
Snippet | Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize... |
SourceID | askewsholts springer safari proquest perlego knovel |
SourceType | Aggregation Database Publisher |
SubjectTerms | Artificial Intelligence COMPUTERS General References Machine Learning Professional and Applied Computing Python Python (Computer program language) Software Engineering Statistics Statistics, general |
TableOfContents | Title Page
Introduction
Table of Contents
1. Exploring Machine Learning
2. Big Data, Machine Learning, and Deep Learning Frameworks
3. Linear Modeling with Scikit-Learn, PySpark, and H2O
4. Survival Analysis with PySpark and Lifelines
5. Nonlinear Modeling with Scikit-Learn, PySpark, and H2O
6. Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O
7. Neural Networks with Scikit-Learn, Keras, and H2O
8. Cluster Analysis with Scikit-Learn, PySpark, and H2O
9. Principal Component Analysis with Scikit-Learn, PySpark, and H2O
10. Automating the Machine Learning Process with H2O
Index Exploring Deep Learning -- Multilayer Perceptron Neural Network -- Preprocessing Features -- Scikit-Learn in Action -- Keras in Action -- Deep Belief Networks -- H2O in Action -- Conclusion -- Chapter 8: Cluster Analysis with Scikit-Learn, PySpark, and H2O -- Exploring the K-Means Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 9: Principal Component Analysis with Scikit-Learn, PySpark, and H2O -- Exploring the Principal Component Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 10: Automating the Machine Learning Process with H2O -- Exploring Automated Machine Learning -- Preprocessing Features -- H2O AutoML in Action -- Conclusion -- Index Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Chapter 1: Exploring Machine Learning -- Exploring Supervised Methods -- Exploring Nonlinear Models -- Exploring Ensemble Methods -- Exploring Unsupervised Methods -- Exploring Cluster Methods -- Exploring Dimension Reduction -- Exploring Deep Learning -- Conclusion -- Chapter 2: Big Data, Machine Learning, and Deep Learning Frameworks -- Big Data -- Big Data Features -- Impact of Big Data on Business and People -- Better Customer Relationships -- Refined Product Development -- Improved Decision-Making -- Big Data Warehousing -- Big Data ETL -- Big Data Frameworks -- Apache Spark -- Resilient Distributed Data Sets -- Spark Configuration -- Spark Frameworks -- SparkSQL -- Spark Streaming -- Spark MLlib -- GraphX -- ML Frameworks -- Scikit-Learn -- H2O -- XGBoost -- DL Frameworks -- Keras -- Chapter 3: Linear Modeling with Scikit-Learn, PySpark, and H2O -- Exploring the Ordinary Least-Squares Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 4: Survival Analysis with PySpark and Lifelines -- Exploring Survival Analysis -- Exploring Cox Proportional Hazards Method -- Lifeline in Action -- Exploring the Accelerated Failure Time Method -- PySpark in Action -- Conclusion -- Chapter 5: Nonlinear Modeling With Scikit-Learn, PySpark, and H2O -- Exploring the Logistic Regression Method -- Scikit-Learn in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 6: Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O -- Decision Trees -- Preprocessing Features -- Scikit-Learn in Action -- Gradient Boosting -- XGBoost in Action -- PySpark in Action -- H2O in Action -- Conclusion -- Chapter 7: Neural Networks with Scikit-Learn, Keras, and H2O |
Title | Data Science Solutions with Python - Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn |
URI | https://app.knovel.com/hotlink/toc/id:kpDSSPFSM1/data-science-solutions/data-science-solutions?kpromoter=Summon https://www.perlego.com/book/4513676/data-science-solutions-with-python-fast-and-scalable-models-using-keras-pyspark-mllib-h2o-xgboost-and-scikitlearn-pdf https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=6792405 https://learning.oreilly.com/library/view/~/9781484277621/?ar http://link.springer.com/10.1007/978-1-4842-7762-1 https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781484277621 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwvV1Lb9QwELaqcqGX8ihiC60sxIHDurt2nKTmgkS7y4qyUGkBrbhYju3AKlESbdIV8Cv4yYydZClFiBuXSEnGeXky_r6Z8RihpzRMY0upJnHAOeFGgB00NCWJSG0UA5VLfTLm_G00-8BfL8PlDir7uTBucausKDc292b6S9m4QOaoKfVoZZ5n1flicTldzOnIJVCSbpQg2276y-EXWeWT20A5WueSs9kwOLrg7vzT1ifDHB3y7B5IAmexo_h9Wahun_eR0a44LRAwOAMYNWKE7qE9VWdgnMBwNTWMae2bALSu7Dq3n8vfYWytUuDDf4Rg_cg23Uc_-m_SJrRkJ1dNcqK_3ygX-R8_2h10y7oZGHfRji3uof1-3QncmaH76Os5XKnfxVvHHnY-ZXz5zVVDwARPVd1gVRgQVLmbIIbdum95jX2WBL6wa1UPQXxRqXWG52_yVTLEM_ZuiJev4P51M-xar7JVQ3wd2gP0cTp5fzYj3QoSRAGsZIIoaoLAJo5GaZ4mehwbkTJtjaAWsBnXJgxEYqPQKm4MTQAtuxqMkUlEnDARBg_QblEW9iHCdswEYEUdqCDmWml1KgBNxVEcaaOEPh2gJ9c6X25yH-2u5TXtYXSAjtuOklVbTEQ6IfmriwbooNMV2TbnoS-qN0C41xzpL9zl7srJy7MoBhI9DgfoqNWoruWG3bz3s17ROom-ZDWISSqdoHSSkh7-6zEfodvMzQPxvqjHaLdZX9kjQGdNcux_KdhekMlPlEU6pg |
linkProvider | Knovel |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Data+Science+Solutions+with+Python&rft.au=Tshepo+Chris+Nokeri&rft.date=2021-01-01&rft.pub=Apress&rft.isbn=9781484277621&rft_id=info:doi/10.1007%2F978-1-4842-7762-1&rft.externalDBID=YSPEL&rft.externalDocID=4513676 |
thumbnail_l | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.perlego.com%2Fbooks%2FRM_Books%2Fingram_csplus_gexhsuob%2F9781484277621.jpg |
thumbnail_m | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.safaribooksonline.com%2Flibrary%2Fcover%2F9781484277621 http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97814842%2F9781484277621.jpg |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fcontent.knovel.com%2Fcontent%2FThumbs%2Fthumb14877.gif http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-1-4842-7762-1 |