Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering

A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on audio, speech, and language processing Vol. 21; no. 11; pp. 2244 - 2254
Main Authors Chin, Gillian M., Nocedal, Jorge, Olsen, Peder A., Rennie, Steven J.
Format Journal Article
LanguageEnglish
Published IEEE 01.11.2013
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ 1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model.
AbstractList A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ 1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model.
Author Rennie, Steven J.
Nocedal, Jorge
Chin, Gillian M.
Olsen, Peder A.
Author_xml – sequence: 1
  givenname: Gillian M.
  surname: Chin
  fullname: Chin, Gillian M.
  email: gillian.chin@u.northwestern.edu
  organization: Dept. of Ind. Eng. & Manage. Sci., Northwestern Univ., Evanston, IL, USA
– sequence: 2
  givenname: Jorge
  surname: Nocedal
  fullname: Nocedal, Jorge
  email: nocedal@eecs.northwestern.edu
  organization: Dept. of Ind. Eng. & Manage. Sci., Northwestern Univ., Evanston, IL, USA
– sequence: 3
  givenname: Peder A.
  surname: Olsen
  fullname: Olsen, Peder A.
  email: pederao@us.ibm.com
  organization: T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
– sequence: 4
  givenname: Steven J.
  surname: Rennie
  fullname: Rennie, Steven J.
  email: sjrennie@us.ibm.com
  organization: T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
BookMark eNo9kMFOAjEURRuDiYB-gHHTHxjsazsdZkmIqAmEBbgzmXQ6r1oDLWkHgn69QyCs7l3ccxdnQHo-eCTkEdgIgJXP68lqPuIMxIhzJUDyG9KHPB9nRcll79pB3ZFBSj-MSaEk9MnnCk3wDV3GBiNdYPsdmkRtiHS5a93W_Tn_RafBH_BIF7qN7khne29aF3yiugNXOx0TdpODjk5709XNPrUYO_Ce3Fq9SfhwySH5mL2sp2_ZfPn6Pp3MM8OhaDMjx7UxYAGtldLUBQOjhJF8XNQctIJGa85QMZnrUgPLhTLMsrq0CFxJI4YEzr8mhpQi2moX3VbH3wpYddJTnfRUJz3VRU_HPJ0Zh4jXvcpBMcjFPwe1ZC4
CODEN ITASD8
Cites_doi 10.21437/Interspeech.2004-268
10.1109/ICASSP.2011.5947557
10.1098/rspa.1946.0056
10.1007/b98874
10.1007/978-3-642-30023-3_7
10.1145/1188455.1188575
10.7551/mitpress/8996.001.0001
10.1002/cpa.20042
10.1007/s10107-007-0170-0
10.1109/ICASSP.2003.1198840
10.1145/1143844.1143856
10.1109/ICASSP.2002.1005699
10.1137/080716542
10.1109/ICASSP.2008.4518545
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TASL.2013.2263142
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-7924
EndPage 2254
ExternalDocumentID 10_1109_TASL_2013_2263142
6516015
Genre orig-research
GroupedDBID 0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
ABQJQ
ABVLG
AETIX
ALMA_UNASSIGNED_HOLDINGS
B-7
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
RIA
RIE
RIG
RNS
AAYXX
CITATION
ID FETCH-LOGICAL-c217t-c48bcc1f1eff44cb701c63c4287b21a61daa20e6045a9a10536c0f0b9fe1264c3
IEDL.DBID RIE
ISSN 1558-7916
IngestDate Fri Aug 23 03:24:28 EDT 2024
Wed Jun 26 19:27:46 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c217t-c48bcc1f1eff44cb701c63c4287b21a61daa20e6045a9a10536c0f0b9fe1264c3
PageCount 11
ParticipantIDs crossref_primary_10_1109_TASL_2013_2263142
ieee_primary_6516015
PublicationCentury 2000
PublicationDate 2013-11-01
PublicationDateYYYYMMDD 2013-11-01
PublicationDate_xml – month: 11
  year: 2013
  text: 2013-11-01
  day: 01
PublicationDecade 2010
PublicationTitle IEEE transactions on audio, speech, and language processing
PublicationTitleAbbrev TASL
PublicationYear 2013
Publisher IEEE
Publisher_xml – name: IEEE
References el ghaoui (ref2) 1987; 2
olsen (ref21) 2004
ref12
ref15
ref20
minka (ref7) 2000
ref22
dasgupta (ref23) 2008
ref17
sra (ref14) 2011
ref16
li (ref11) 2005; 5
ref19
ref18
boyd (ref1) 1987; 15
hsieh (ref3) 2011
ref8
banerjee (ref10) 2008; 9
rish (ref13) 2011
ref9
ref6
ref5
olsen (ref4) 2012
References_xml – volume: 15
  year: 1987
  ident: ref1
  publication-title: Linear Matrix Inequalities In System and Control Theory
  contributor:
    fullname: boyd
– start-page: 673
  year: 2004
  ident: ref21
  article-title: Fast clustering of gaussians and the virtue of representing gaussians in exponential model format
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2004-268
  contributor:
    fullname: olsen
– ident: ref20
  doi: 10.1109/ICASSP.2011.5947557
– ident: ref12
  doi: 10.1098/rspa.1946.0056
– year: 2000
  ident: ref7
  article-title: Old and new matrix algebra useful for statistics
  contributor:
    fullname: minka
– ident: ref15
  doi: 10.1007/b98874
– year: 2011
  ident: ref3
  article-title: Sparse inverse covariance matrix estimation using quadratic approximation
  publication-title: NIPS
  contributor:
    fullname: hsieh
– year: 2012
  ident: ref4
  article-title: Newton-like methods for sparse inverse covariance estimation
  publication-title: NIPS
  contributor:
    fullname: olsen
– volume: 5
  start-page: 669
  year: 2005
  ident: ref11
  article-title: Optimal clustering and non-uniform allocation of gaussian kernels in scalar dimension for HMM compression
  publication-title: Proc ICASSP
  contributor:
    fullname: li
– ident: ref5
  doi: 10.1007/978-3-642-30023-3_7
– ident: ref8
  doi: 10.1145/1188455.1188575
– year: 2011
  ident: ref14
  publication-title: Optimization for Machine Learning
  doi: 10.7551/mitpress/8996.001.0001
  contributor:
    fullname: sra
– ident: ref16
  doi: 10.1002/cpa.20042
– ident: ref18
  doi: 10.1007/s10107-007-0170-0
– ident: ref6
  doi: 10.1109/ICASSP.2003.1198840
– ident: ref9
  doi: 10.1145/1143844.1143856
– ident: ref22
  doi: 10.1109/ICASSP.2002.1005699
– year: 2011
  ident: ref13
  publication-title: ELEN E6898 Sparse Signal Modeling (Spring 2011) Lecture 7 Beyond LASSO Other Losses (Likelihoods)
  contributor:
    fullname: rish
– volume: 9
  start-page: 485
  year: 2008
  ident: ref10
  article-title: Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data
  publication-title: J Mach Learn Res
  contributor:
    fullname: banerjee
– ident: ref17
  doi: 10.1137/080716542
– year: 2008
  ident: ref23
  publication-title: ?The hardness of K-Means clustering ?
  contributor:
    fullname: dasgupta
– ident: ref19
  doi: 10.1109/ICASSP.2008.4518545
– volume: 2
  year: 1987
  ident: ref2
  publication-title: Advances in Linear Matrix Inequality Methods in Control
  contributor:
    fullname: el ghaoui
SSID ssj0043641
Score 2.1245193
Snippet A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing...
SourceID crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 2244
SubjectTerms clustering
Convexity
FISTA
Hessian structure
Jeffreys divergence
Kullback Leibler divergence
Large scale systems
LASSO
Newton's method
Optimization
Pattern recognition
Sparse matrices
Title Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering
URI https://ieeexplore.ieee.org/document/6516015
Volume 21
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJz34a4rzFzl4Ets1TZq2xzEcQ6w7bIMdhJKkCYi6DW1l7K_3pe3GFA_eSknakLzkfV_y3heEbjRXsaCKOH6ogaAo4TkRL3ebPHDGJqbS2OTk5IkPJuxhGkwb6G6TC6O1LoPPtGsfy7P8bK4Ku1XW4QEB_hDsoJ0wjqtcrfWqyyhnlTZqEFkJRl6fYMIvO-Pu6NEGcVEXsAYlzP_hg7YuVSl9Sv8AJevWVKEkr26RS1etfgk1_re5h2i_Bpe4W1nDEWro2THa25IcbKHnkWXAGR5ayU2clPdHf2JArngIi8f7ywpK4Z4NRV_ixMr3L3EfXF9pnVhAxdECqLCGIl_Asq3J4N5bYdUWoOIJmvTvx72BU9-w4CigIrmjWCSVIoZoYxhTMvSI4lRZGiV9IjjJhPA9zQH3iVgAFKNcecaTsdEEkJSip6g5m8_0GcLGCONnoVEyyhiVMM1VoAP4XgYMjGvTRrfrPk8XlZBGWhIQL07tAKV2gNJ6gNqoZbtzU7DuyfO_X1-gXVu5yhC8RM38o9BXABVyeV3ayDcz2bx2
link.rule.ids 315,783,787,799,27938,27939,55088
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED1BGYCBb0T59MCESInjxG1GVFEVaMrQIjEgRbZjSwhoK2hRxa_nLklRQQxsUWQ7ln3xvWffPQOcWmliJQz3grpFgmKU7zVkvtvkozN2sdCOkpOTrmzfhzcP0cMCnH_nwlhr8-AzW6PH_Cw_G5oJbZVdyIgjf4gWYSkiXFFka83W3VDIsFBHjRokwijLM0z86EX_stehMC5RQ7QheBj88EJz16rkXqW1DsmsP0UwyXNtMtY18_lLqvG_Hd6AtRJessvCHjZhwQ62YHVOdHAbHnvEgTN2R6KbLMlvkH5niF3ZHS4fr0-fWIo1KRh9yhIS8J-yFjq_3D6Zwoq9EZJhi0U-kGeT0bDmy4T0FrDiDty3rvrNtlfeseAZJCNjz4QNbQx33DoXhkbXfW6kMESkdMCV5JlSgW8lIj8VKwRjQhrf-Tp2liOWMmIXKoPhwO4Bc065IKs7oxtZKDT-6CayEbaXIQeT1lXhbDbm6aiQ0khzCuLHKU1QShOUlhNUhW0azu-C5Uju__36BJbb_aSTdq67twewQg0V-YKHUBm_TewRAoexPs7t5Qs_Zr_D
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Second+Order+Methods+for+Optimizing+Convex+Matrix+Functions+and+Sparse+Covariance+Clustering&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Chin%2C+Gillian+M.&rft.au=Nocedal%2C+Jorge&rft.au=Olsen%2C+Peder+A.&rft.au=Rennie%2C+Steven+J.&rft.date=2013-11-01&rft.pub=IEEE&rft.issn=1558-7916&rft.eissn=1558-7924&rft.volume=21&rft.issue=11&rft.spage=2244&rft.epage=2254&rft_id=info:doi/10.1109%2FTASL.2013.2263142&rft.externalDocID=6516015
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon