Block coordinate descent algorithms for large-scale sparse multiclass classification

Over the past decade, ℓ 1 regularization has emerged as a powerful way to learn classifiers with implicit feature selection. More recently, mixed-norm (e.g., ℓ 1 / ℓ 2 ) regularization has been utilized as a way to select entire groups of features. In this paper, we propose a novel direct multiclass...

Full description

Saved in:
Bibliographic Details
Published inMachine learning Vol. 93; no. 1; pp. 31 - 52
Main Authors Blondel, Mathieu, Seki, Kazuhiro, Uehara, Kuniaki
Format Journal Article Conference Proceeding
LanguageEnglish
Published Boston Springer US 01.10.2013
Springer
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Over the past decade, ℓ 1 regularization has emerged as a powerful way to learn classifiers with implicit feature selection. More recently, mixed-norm (e.g., ℓ 1 / ℓ 2 ) regularization has been utilized as a way to select entire groups of features. In this paper, we propose a novel direct multiclass formulation specifically designed for large-scale and high-dimensional problems such as document classification. Based on a multiclass extension of the squared hinge loss, our formulation employs ℓ 1 / ℓ 2 regularization so as to force weights corresponding to the same features to be zero across all classes, resulting in compact and fast-to-evaluate multiclass models. For optimization, we employ two globally-convergent variants of block coordinate descent, one with line search (Tseng and Yun in Math. Program. 117:387–423, 2009 ) and the other without (Richtárik and Takáč in Math. Program. 1–38, 2012a ; Tech. Rep. arXiv:1212.0873 , 2012b ). We present the two variants in a unified manner and develop the core components needed to efficiently solve our formulation. The end result is a couple of block coordinate descent algorithms specifically tailored to our multiclass formulation. Experimentally, we show that block coordinate descent performs favorably compared to other solvers such as FOBOS, FISTA and SpaRSA. Furthermore, we show that our formulation obtains very compact multiclass models and outperforms ℓ 1 / ℓ 2 -regularized multiclass logistic regression in terms of training speed, while achieving comparable test accuracy.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0885-6125
1573-0565
DOI:10.1007/s10994-013-5367-2