Python Data Analytics - With Pandas, NumPy, and Matplotlib (3rd Edition)

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This third edition is fully updated for the latest version of Pytho...

Full description

Saved in:
Bibliographic Details
Main Author Nelli, Fabio
Format eBook
LanguageEnglish
Published Berkeley, CA Apress, an imprint of Springer Nature 2023
Apress
Apress L. P
Edition3
Subjects
Online AccessGet full text
ISBN1484295315
9781484295311
1484295323
9781484295328
DOI10.1007/978-1-4842-9532-8

Cover

Table of Contents:
  • Title Page Preface Table of Contents 1. An Introduction to Data Analysis 2. Introduction to the Python World 3. The NumPy Library 4. The pandas Library - An Introduction 5. Pandas: Reading and Writing Data 6. Pandas in Depth: Data Manipulation 7. Data Visualization with matplotlib and Seaborn 8. Machine Learning with scikit-learn 9. Deep Learning with TensorFlow 10. An Example - Meteorological Data 11. Embedding the JavaScript D3 Library in the IPython Notebook 12. Recognizing Handwritten Digits 13. Textual Data Analysis with NLTK 14. Image Analysis and Computer Vision with OpenCV Appendices Index
  • Data Visualization with Jupyter Notebook -- Set the Properties of the Plot -- matplotlib and NumPy -- Using kwargs -- Working with Multiple Figures and Axes -- Adding Elements to the Chart -- Adding Text -- Adding a Grid -- Adding a Legend -- Saving Your Charts -- Saving the Code -- Saving Your Notebook as an HTML File or as Other File Formats -- Saving Your Chart Directly as an Image -- Handling Date Values -- Chart Typology -- Line Charts -- Line Charts with pandas -- Histograms -- Bar Charts -- Horizontal Bar Charts -- Multiserial Bar Charts -- Multiseries Bar Charts with a pandas Dataframe -- Multiseries Stacked Bar Charts -- Stacked Bar Charts with a pandas Dataframe -- Other Bar Chart Representations -- Pie Charts -- Pie Charts with a pandas Dataframe -- Advanced Charts -- Contour Plots -- Polar Charts -- The mplot3d Toolkit -- 3D Surfaces -- Scatter Plots in 3D -- Bar Charts in 3D -- Multipanel Plots -- Display Subplots Within Other Subplots -- Grids of Subplots -- The Seaborn Library -- Conclusions -- Chapter 8: Machine Learning with scikit-learn -- The scikit-learn Library -- Machine Learning -- Supervised and Unsupervised Learning -- Supervised Learning -- Unsupervised Learning -- Training Set and Testing Set -- Supervised Learning with scikit-learn -- The Iris Flower Dataset -- The PCA Decomposition -- K-Nearest Neighbors Classifier -- Diabetes Dataset -- Linear Regression: The Least Square Regression -- Support Vector Machines (SVMs) -- Support Vector Classification (SVC) -- Nonlinear SVC -- Plotting Different SVM Classifiers Using the Iris Dataset -- Support Vector Regression (SVR) -- Conclusions -- Chapter 9: Deep Learning with TensorFlow -- Artificial Intelligence, Machine Learning, and Deep Learning -- Artificial Intelligence -- Machine Learning Is a Branch of Artificial Intelligence -- Deep Learning Is a Branch of Machine Learning
  • Increment and Decrement Operators -- Universal Functions (ufunc) -- Aggregate Functions -- Indexing, Slicing, and Iterating -- Indexing -- Slicing -- Iterating an Array -- Conditions and Boolean Arrays -- Shape Manipulation -- Array Manipulation -- Joining Arrays -- Splitting Arrays -- General Concepts -- Copies or Views of Objects -- Vectorization -- Broadcasting -- Structured Arrays -- Reading and Writing Array Data on Files -- Loading and Saving Data in Binary Files -- Reading Files with Tabular Data -- Conclusions -- Chapter 4: The pandas Library-An Introduction -- pandas: The Python Data Analysis Library -- Installation of pandas -- Installation from Anaconda -- Installation from PyPI -- Getting Started with pandas -- Introduction to pandas Data Structures -- The Series -- Declaring a Series -- Selecting the Internal Elements -- Assigning Values to the Elements -- Defining a Series from NumPy Arrays and Other Series -- Filtering Values -- Operations and Mathematical Functions -- Evaluating Vales -- NaN Values -- Series as Dictionaries -- Operations Between Series -- The Dataframe -- Defining a Dataframe -- Selecting Elements -- Assigning Values -- Membership of a Value -- Deleting a Column -- Filtering -- Dataframe from a Nested dict -- Transposition of a Dataframe -- The Index Objects -- Methods on Index -- Index with Duplicate Labels -- Other Functionalities on Indexes -- Reindexing -- Dropping -- Arithmetic and Data Alignment -- Operations Between Data Structures -- Flexible Arithmetic Methods -- Operations Between Dataframes and Series -- Function Application and Mapping -- Functions by Element -- Functions by Row or Column -- Statistics Functions -- Sorting and Ranking -- Correlation and Covariance -- "Not a Number" Data -- Assigning a NaN Value -- Filtering Out NaN Values -- Filling in NaN Occurrences -- Hierarchical Indexing and Leveling
  • Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Preface -- Chapter 1: An Introduction to Data Analysis -- Data Analysis -- Knowledge Domains of the Data Analyst -- Computer Science -- Mathematics and Statistics -- Machine Learning and Artificial Intelligence -- Professional Fields of Application -- Understanding the Nature of the Data -- When the Data Become Information -- When the Information Becomes Knowledge -- Types of Data -- The Data Analysis Process -- Problem Definition -- Data Extraction -- Data Preparation -- Data Exploration/Visualization -- Predictive Modeling -- Model Validation -- Deployment -- Quantitative and Qualitative Data Analysis -- Open Data -- Python and Data Analysis -- Conclusions -- Chapter 2: Introduction to the Python World -- Python-The Programming Language -- The Interpreter and the Execution Phases of the Code -- CPython -- Cython -- Pyston -- Jython -- IronPython -- PyPy -- RustPython -- Installing Python -- Python Distributions -- Anaconda -- Anaconda Navigator -- Using Python -- Python Shell -- Run an Entire Program -- Implement the Code Using an IDE -- Interact with Python -- Writing Python Code -- Make Calculations -- Import New Libraries and Functions -- Data Structure -- Functional Programming -- Indentation -- IPython -- IPython Shell -- The Jupyter Project -- Jupyter QtConsole -- Jupyter Notebook -- Jupyter Lab -- PyPI-The Python Package Index -- The IDEs for Python -- Spyder -- Eclipse (pyDev) -- Sublime -- Liclipse -- NinjaIDE -- Komodo IDE -- SciPy -- NumPy -- Pandas -- matplotlib -- Conclusions -- Chapter 3: The NumPy Library -- NumPy: A Little History -- The NumPy Installation -- ndarray: The Heart of the Library -- Create an Array -- Types of Data -- The dtype Option -- Intrinsic Creation of an Array -- Basic Operations -- Arithmetic Operators -- The Matrix Product
  • Chapter 12: Recognizing Handwritten Digits
  • Reordering and Sorting Levels -- Summary Statistics with groupby Instead of with Level -- Conclusions -- Chapter 5: pandas: Reading and Writing Data -- I/O API Tools -- CSV and Textual Files -- Reading Data in CSV or Text Files -- Using Regexp to Parse TXT Files -- Reading TXT Files Into Parts -- Writing Data in CSV -- Reading and Writing HTML Files -- Writing Data in HTML -- Reading Data from an HTML File -- Reading Data from XML -- Reading and Writing Data on Microsoft Excel Files -- JSON Data -- The HDF5 Format -- Pickle-Python Object Serialization -- Serialize a Python Object with cPickle -- Pickling with pandas -- Interacting with Databases -- Loading and Writing Data with SQLite3 -- Loading and Writing Data with PostgreSQL in a Docker Container -- Reading and Writing Data with a NoSQL Database: MongoDB -- Conclusions -- Chapter 6: pandas in Depth: Data Manipulation -- Data Preparation -- Merging -- Merging on an Index -- Concatenating -- Combining -- Pivoting -- Pivoting with Hierarchical Indexing -- Pivoting from "Long" to "Wide" Format -- Removing -- Data Transformation -- Removing Duplicates -- Mapping -- Replacing Values via Mapping -- Adding Values via Mapping -- Rename the Indexes of the Axes -- Discretization and Binning -- Detecting and Filtering Outliers -- Permutation -- Random Sampling -- String Manipulation -- Built-in Methods for String Manipulation -- Regular Expressions -- Data Aggregation -- GroupBy -- A Practical Example -- Hierarchical Grouping -- Group Iteration -- Chain of Transformations -- Functions on Groups -- Advanced Data Aggregation -- Conclusions -- Chapter 7: Data Visualization with matplotlib and Seaborn -- The matplotlib Library -- Installation -- The matplotlib Architecture -- Backend Layer -- Artist Layer -- Scripting Layer (pyplot) -- pylab and pyplot -- pyplot -- The Plotting Window
  • The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning -- Deep Learning -- Neural Networks and GPUs -- Data Availability: Open Data Source, Internet of Things, and Big Data -- Python -- Deep Learning Python Frameworks -- Artificial Neural Networks -- How Artificial Neural Networks Are Structured -- Single Layer Perceptron (SLP) -- Multilayer Perceptron (MLP) -- Correspondence Between Artificial and Biological Neural Networks -- TensorFlow -- TensorFlow: Google's Framework -- TensorFlow: Data Flow Graph -- Start Programming with TensorFlow -- TensorFlow 2.x vs TensorFlow 1.x -- Installing TensorFlow -- Programming with the Jupyter Notebook -- Tensors -- Loading Data Into a Tensor from a pandas Dataframe -- Loading Data in a Tensor from a CSV File -- Operation on Tensors -- Developing a Deep Learning Model with TensorFlow -- Model Building -- Model Compiling -- Model Training and Testing -- Prediction Making -- Practical Examples with TensorFlow 2.x -- Single Layer Perceptron with TensorFlow -- Before Starting -- Data To Be Analyzed -- Multilayer Perceptron (with One Hidden Layer) with TensorFlow -- Multilayer Perceptron (with Two Hidden Layers) with TensorFlow -- Conclusions -- Chapter 10: An Example-Meteorological Data -- A Hypothesis to Be Tested: The Influence of the Proximity of the Sea -- The System in the Study: The Adriatic Sea and the Po Valley -- Finding the Data Source -- Data Analysis on Jupyter Notebook -- Analysis of Processed Meteorological Data -- The RoseWind -- Calculating the Mean Distribution of the Wind Speed -- Conclusions -- Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook -- The Open Data Source for Demographics -- The JavaScript D3 Library -- Drawing a Clustered Bar Chart -- The Choropleth Maps -- The Choropleth Map of the U.S. Population in 2022 -- Conclusions