Support Vector Machines for Pattern Classification

A guide on the use of SVMs in pattern classification, including a rigorous performance comparison of classifiers and regressors. The book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors. Features:...

Full description

Saved in:
Bibliographic Details
Main Author Abe, Shigeo
Format eBook
LanguageEnglish
German
Published London Springer Nature 2010
Springer Verlag London Limited
Springer London, Limited
Springer London
Springer
Edition2. Aufl.
SeriesAdvances in Pattern Recognition
Subjects
Online AccessGet full text

Cover

Loading…
Table of Contents:
  • Intro -- Preface -- Acknowledgments -- Symbols -- 1 Introduction -- 1.1 Decision Functions -- 1.1.1 Decision Functions for Two-Class Problems -- 1.1.2 Decision Functions for Multiclass Problems -- 1.2 Determination of Decision Functions -- 1.3 Data Sets Used in the Book -- 1.4 Classifier Evaluation -- References -- 2 Two-Class Support Vector Machines -- 2.1 Hard-Margin Support Vector Machines -- 2.2 L1 Soft-Margin Support Vector Machines -- 2.3 Mapping to a High-Dimensional Space -- 2.3.1 Kernel Tricks -- 2.3.2 Kernels -- 2.3.3 Normalizing Kernels -- 2.3.4 Properties of Mapping Functions Associated with Kernels -- 2.3.5 Implicit Bias Terms -- 2.3.6 Empirical Feature Space -- 2.4 L2 Soft-Margin Support Vector Machines -- 2.5 Advantages and Disadvantages -- 2.5.1 Advantages -- 2.5.2 Disadvantages -- 2.6 Characteristics of Solutions -- 2.6.1 Hessian Matrix -- 2.6.2 Dependence of Solutions on C -- 2.6.3 Equivalence of L1 and L2 Support Vector Machines -- 2.6.4 Nonunique Solutions -- 2.6.5 Reducing the Number of Support Vectors -- 2.6.6 Degenerate Solutions -- 2.6.7 Duplicate Copies of Data -- 2.6.8 Imbalanced Data -- 2.6.9 Classification for the Blood Cell Data -- 2.7 Class Boundaries for Different Kernels -- 2.8 Developing Classifiers -- 2.8.1 Model Selection -- 2.8.2 Estimating Generalization Errors -- 2.8.3 Sophistication of Model Selection -- 2.8.4 Effect of Model Selection by Cross-Validation -- 2.9 Invariance for Linear Transformation -- References -- 3 Multiclass Support Vector Machines -- 3.1 One-Against-All Support Vector Machines -- 3.1.1 Conventional Support Vector Machines -- 3.1.2 Fuzzy Support Vector Machines -- 3.1.3 Equivalence of Fuzzy Support Vector Machines and Support Vector Machines with Continuous Decision Functions -- 3.1.4 Decision-Tree-Based Support Vector Machines -- 3.2 Pairwise Support Vector Machines
  • 4.7.2 Incremental Training Using Hyperspheres -- 4.8 Learning Using Privileged Information -- 4.9 Semi-Supervised Learning -- 4.10 Multiple Classifier Systems -- 4.11 Multiple Kernel Learning -- 4.12 Confidence Level -- 4.13 Visualization -- References -- 5 Training Methods -- 5.1 Preselecting Support Vector Candidates -- 5.1.1 Approximation of Boundary Data -- 5.1.2 Performance Evaluation -- 5.2 Decomposition Techniques -- 5.3 KKT Conditions Revisited -- 5.4 Overview of Training Methods -- 5.5 Primal--Dual Interior-Point Methods -- 5.5.1 Primal--Dual Interior-Point Methods for Linear Programming -- 5.5.2 Primal--Dual Interior-Point Methods for Quadratic Programming -- 5.5.3 Performance Evaluation -- 5.6 Steepest Ascent Methods and Newton's Methods -- 5.6.1 Solving Quadratic Programming Problems Without Constraints -- 5.6.2 Training of L1 Soft-Margin Support Vector Machines -- 5.6.3 Sequential Minimal Optimization -- 5.6.4 Training of L2 Soft-Margin Support Vector Machines -- 5.6.5 Performance Evaluation -- 5.7 Batch Training by Exact Incremental Training -- 5.7.1 KKT Conditions -- 5.7.2 Training by Solving a Set of Linear Equations -- 5.7.3 Performance Evaluation -- 5.8 Active Set Training in Primal and Dual -- 5.8.1 Training Support Vector Machines in the Primal -- 5.8.2 Comparison of Training Support Vector Machines in the Primal and the Dual -- 5.8.3 Performance Evaluation -- 5.9 Training of Linear Programming Support Vector Machines -- 5.9.1 Decomposition Techniques -- 5.9.2 Decomposition Techniques for Linear Programming Support Vector Machines -- 5.9.3 Computer Experiments -- References -- 6 Kernel-Based Methods -- 6.1 Kernel Least Squares -- 6.1.1 Algorithm -- 6.1.2 Performance Evaluation -- 6.2 Kernel Principal Component Analysis -- 6.3 Kernel Mahalanobis Distance -- 6.3.1 SVD-Based Kernel Mahalanobis Distance
  • 3.2.1 Conventional Support Vector Machines -- 3.2.2 Fuzzy Support Vector Machines -- 3.2.3 Performance Comparison of Fuzzy Support Vector Machines -- 3.2.4 Cluster-Based Support Vector Machines -- 3.2.5 Decision-Tree-Based Support Vector Machines -- 3.2.6 Pairwise Classification with Correcting Classifiers -- 3.3 Error-Correcting Output Codes -- 3.3.1 Output Coding by Error-Correcting Codes -- 3.3.2 Unified Scheme for Output Coding -- 3.3.3 Equivalence of ECOC with Membership Functions -- 3.3.4 Performance Evaluation -- 3.4 All-at-Once Support Vector Machines -- 3.5 Comparisons of Architectures -- 3.5.1 One-Against-All Support Vector Machines -- 3.5.2 Pairwise Support Vector Machines -- 3.5.3 ECOC Support Vector Machines -- 3.5.4 All-at-Once Support Vector Machines -- 3.5.5 Training Difficulty -- 3.5.6 Training Time Comparison -- References -- 4 Variants of Support Vector Machines -- 4.1 Least-Squares Support Vector Machines -- 4.1.1 Two-Class Least-Squares Support Vector Machines -- 4.1.2 One-Against-All Least-Squares Support Vector Machines -- 4.1.3 Pairwise Least-Squares Support Vector Machines -- 4.1.4 All-at-Once Least-Squares Support Vector Machines -- 4.1.5 Performance Comparison -- 4.2 Linear Programming Support Vector Machines -- 4.2.1 Architecture -- 4.2.2 Performance Evaluation -- 4.3 Sparse Support Vector Machines -- 4.3.1 Several Approaches for Sparse SupportVector Machines -- 4.3.2 Idea -- 4.3.3 Support Vector Machines Trained in the Empirical Feature Space -- 4.3.4 Selection of Linearly Independent Data -- 4.3.5 Performance Evaluation -- 4.4 Performance Comparison of Different Classifiers -- 4.5 Robust Support Vector Machines -- 4.6 Bayesian Support Vector Machines -- 4.6.1 One-Dimensional Bayesian Decision Functions -- 4.6.2 Parallel Displacement of a Hyperplane -- 4.6.3 Normal Test -- 4.7 Incremental Training -- 4.7.1 Overview
  • 6.3.2 KPCA-Based Mahalanobis Distance -- 6.4 Principal Component Analysis in the EmpiricalFeature Space -- 6.5 Kernel Discriminant Analysis -- 6.5.1 Kernel Discriminant Analysis for Two-Class Problems -- 6.5.2 Linear Discriminant Analysis for Two-Class Problems in the Empirical Feature Space -- 6.5.3 Kernel Discriminant Analysis for Multiclass Problems -- References -- 7 Feature Selection and Extraction -- 7.1 Selecting an Initial Set of Features -- 7.2 Procedure for Feature Selection -- 7.3 Feature Selection Using Support Vector Machines -- 7.3.1 Backward or Forward Feature Selection -- 7.3.2 Support Vector Machine-Based Feature Selection -- 7.3.3 Feature Selection by Cross-Validation -- 7.4 Feature Extraction -- References -- 8 Clustering -- 8.1 Domain Description -- 8.2 Extension to Clustering -- References -- 9 Maximum-Margin Multilayer Neural Networks -- 9.1 Approach -- 9.2 Three-Layer Neural Networks -- 9.3 CARVE Algorithm -- 9.4 Determination of Hidden-Layer Hyperplanes -- 9.4.1 Rotation of Hyperplanes -- 9.4.2 Training Algorithm -- 9.5 Determination of Output-Layer Hyperplanes -- 9.6 Determination of Parameter Values -- 9.7 Performance Evaluation -- References -- 10 Maximum-Margin Fuzzy Classifiers -- 10.1 Kernel Fuzzy Classifiers with Ellipsoidal Regions -- 10.1.1 Conventional Fuzzy Classifiers withEllipsoidal Regions -- 10.1.2 Extension to a Feature Space -- 10.1.3 Transductive Training -- 10.1.4 Maximizing Margins -- 10.1.5 Performance Evaluation -- 10.2 Fuzzy Classifiers with Polyhedral Regions -- 10.2.1 Training Methods -- 10.2.2 Performance Evaluation -- References -- 11 Function Approximation -- 11.1 Optimal Hyperplanes -- 11.2 L1 Soft-Margin Support Vector Regressors -- 11.3 L2 Soft-Margin Support Vector Regressors -- 11.4 Model Selection -- 11.5 Training Methods -- 11.5.1 Overview -- 11.5.2 Newton's Methods
  • 11.5.3 Active Set Training -- 11.6 Variants of Support Vector Regressors -- 11.6.1 Linear Programming Support Vector Regressors -- 11.6.2 -Support Vector Regressors -- 11.6.3 Least-Squares Support Vector Regressors -- 11.7 Variable Selection -- 11.7.1 Overview -- 11.7.2 Variable Selection by Block Deletion -- 11.7.3 Performance Evaluation -- References -- A Conventional Classifiers -- A.1 Bayesian Classifiers -- A.2 Nearest-Neighbor Classifiers -- References -- B Matrices -- B.1 Matrix Properties -- B.2 Least-Squares Methods and Singular Value Decomposition -- B.3 Covariance Matrices -- References -- C Quadratic Programming -- C.1 Optimality Conditions -- C.2 Properties of Solutions -- D Positive Semidefinite Kernels and Reproducing Kernel Hilbert Space -- D.1 Positive Semidefinite Kernels -- D.2 Reproducing Kernel Hilbert Space -- References -- Index