Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering

A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 21; no. 11; pp. 2244 - 2254
Main Authors	Chin, Gillian M., Nocedal, Jorge, Olsen, Peder A., Rennie, Steven J.
Format	Journal Article
Language	English
Published	IEEE 01.11.2013
Subjects	clustering Convexity FISTA Hessian structure Jeffreys divergence Kullback Leibler divergence Large scale systems LASSO Newton's method Optimization Pattern recognition Sparse matrices
Online Access	Get full text

Cover

Loading…

Abstract	A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ 1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model.
AbstractList	A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ 1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model.
Author	Rennie, Steven J. Nocedal, Jorge Chin, Gillian M. Olsen, Peder A.
Author_xml	– sequence: 1 givenname: Gillian M. surname: Chin fullname: Chin, Gillian M. email: gillian.chin@u.northwestern.edu organization: Dept. of Ind. Eng. & Manage. Sci., Northwestern Univ., Evanston, IL, USA – sequence: 2 givenname: Jorge surname: Nocedal fullname: Nocedal, Jorge email: nocedal@eecs.northwestern.edu organization: Dept. of Ind. Eng. & Manage. Sci., Northwestern Univ., Evanston, IL, USA – sequence: 3 givenname: Peder A. surname: Olsen fullname: Olsen, Peder A. email: pederao@us.ibm.com organization: T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA – sequence: 4 givenname: Steven J. surname: Rennie fullname: Rennie, Steven J. email: sjrennie@us.ibm.com organization: T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
BookMark	eNo9kMFOAjEURRuDiYB-gHHTHxjsazsdZkmIqAmEBbgzmXQ6r1oDLWkHgn69QyCs7l3ccxdnQHo-eCTkEdgIgJXP68lqPuIMxIhzJUDyG9KHPB9nRcll79pB3ZFBSj-MSaEk9MnnCk3wDV3GBiNdYPsdmkRtiHS5a93W_Tn_RafBH_BIF7qN7khne29aF3yiugNXOx0TdpODjk5709XNPrUYO_Ce3Fq9SfhwySH5mL2sp2_ZfPn6Pp3MM8OhaDMjx7UxYAGtldLUBQOjhJF8XNQctIJGa85QMZnrUgPLhTLMsrq0CFxJI4YEzr8mhpQi2moX3VbH3wpYddJTnfRUJz3VRU_HPJ0Zh4jXvcpBMcjFPwe1ZC4
CODEN	ITASD8
Cites_doi	10.21437/Interspeech.2004-268 10.1109/ICASSP.2011.5947557 10.1098/rspa.1946.0056 10.1007/b98874 10.1007/978-3-642-30023-3_7 10.1145/1188455.1188575 10.7551/mitpress/8996.001.0001 10.1002/cpa.20042 10.1007/s10107-007-0170-0 10.1109/ICASSP.2003.1198840 10.1145/1143844.1143856 10.1109/ICASSP.2002.1005699 10.1137/080716542 10.1109/ICASSP.2008.4518545
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TASL.2013.2263142
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-7924
EndPage	2254
ExternalDocumentID	10_1109_TASL_2013_2263142 6516015
Genre	orig-research
GroupedDBID	0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ ABQJQ ABVLG AETIX ALMA_UNASSIGNED_HOLDINGS B-7 BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RIG RNS AAYXX CITATION
ID	FETCH-LOGICAL-c217t-c48bcc1f1eff44cb701c63c4287b21a61daa20e6045a9a10536c0f0b9fe1264c3
IEDL.DBID	RIE
ISSN	1558-7916
IngestDate	Fri Aug 23 03:24:28 EDT 2024 Wed Jun 26 19:27:46 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	11
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c217t-c48bcc1f1eff44cb701c63c4287b21a61daa20e6045a9a10536c0f0b9fe1264c3
PageCount	11
ParticipantIDs	crossref_primary_10_1109_TASL_2013_2263142 ieee_primary_6516015
PublicationCentury	2000
PublicationDate	2013-11-01
PublicationDateYYYYMMDD	2013-11-01
PublicationDate_xml	– month: 11 year: 2013 text: 2013-11-01 day: 01
PublicationDecade	2010
PublicationTitle	IEEE transactions on audio, speech, and language processing
PublicationTitleAbbrev	TASL
PublicationYear	2013
Publisher	IEEE
Publisher_xml	– name: IEEE
References	el ghaoui (ref2) 1987; 2 olsen (ref21) 2004 ref12 ref15 ref20 minka (ref7) 2000 ref22 dasgupta (ref23) 2008 ref17 sra (ref14) 2011 ref16 li (ref11) 2005; 5 ref19 ref18 boyd (ref1) 1987; 15 hsieh (ref3) 2011 ref8 banerjee (ref10) 2008; 9 rish (ref13) 2011 ref9 ref6 ref5 olsen (ref4) 2012
References_xml	– volume: 15 year: 1987 ident: ref1 publication-title: Linear Matrix Inequalities In System and Control Theory contributor: fullname: boyd – start-page: 673 year: 2004 ident: ref21 article-title: Fast clustering of gaussians and the virtue of representing gaussians in exponential model format publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2004-268 contributor: fullname: olsen – ident: ref20 doi: 10.1109/ICASSP.2011.5947557 – ident: ref12 doi: 10.1098/rspa.1946.0056 – year: 2000 ident: ref7 article-title: Old and new matrix algebra useful for statistics contributor: fullname: minka – ident: ref15 doi: 10.1007/b98874 – year: 2011 ident: ref3 article-title: Sparse inverse covariance matrix estimation using quadratic approximation publication-title: NIPS contributor: fullname: hsieh – year: 2012 ident: ref4 article-title: Newton-like methods for sparse inverse covariance estimation publication-title: NIPS contributor: fullname: olsen – volume: 5 start-page: 669 year: 2005 ident: ref11 article-title: Optimal clustering and non-uniform allocation of gaussian kernels in scalar dimension for HMM compression publication-title: Proc ICASSP contributor: fullname: li – ident: ref5 doi: 10.1007/978-3-642-30023-3_7 – ident: ref8 doi: 10.1145/1188455.1188575 – year: 2011 ident: ref14 publication-title: Optimization for Machine Learning doi: 10.7551/mitpress/8996.001.0001 contributor: fullname: sra – ident: ref16 doi: 10.1002/cpa.20042 – ident: ref18 doi: 10.1007/s10107-007-0170-0 – ident: ref6 doi: 10.1109/ICASSP.2003.1198840 – ident: ref9 doi: 10.1145/1143844.1143856 – ident: ref22 doi: 10.1109/ICASSP.2002.1005699 – year: 2011 ident: ref13 publication-title: ELEN E6898 Sparse Signal Modeling (Spring 2011) Lecture 7 Beyond LASSO Other Losses (Likelihoods) contributor: fullname: rish – volume: 9 start-page: 485 year: 2008 ident: ref10 article-title: Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data publication-title: J Mach Learn Res contributor: fullname: banerjee – ident: ref17 doi: 10.1137/080716542 – year: 2008 ident: ref23 publication-title: ?The hardness of K-Means clustering ? contributor: fullname: dasgupta – ident: ref19 doi: 10.1109/ICASSP.2008.4518545 – volume: 2 year: 1987 ident: ref2 publication-title: Advances in Linear Matrix Inequality Methods in Control contributor: fullname: el ghaoui
SSID	ssj0043641
Score	2.1245193
Snippet	A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing...
SourceID	crossref ieee
SourceType	Aggregation Database Publisher
StartPage	2244
SubjectTerms	clustering Convexity FISTA Hessian structure Jeffreys divergence Kullback Leibler divergence Large scale systems LASSO Newton's method Optimization Pattern recognition Sparse matrices
Title	Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering
URI	https://ieeexplore.ieee.org/document/6516015
Volume	21
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJz34a4rzFzl4Ets1TZq2xzEcQ6w7bIMdhJKkCYi6DW1l7K_3pe3GFA_eSknakLzkfV_y3heEbjRXsaCKOH6ogaAo4TkRL3ebPHDGJqbS2OTk5IkPJuxhGkwb6G6TC6O1LoPPtGsfy7P8bK4Ku1XW4QEB_hDsoJ0wjqtcrfWqyyhnlTZqEFkJRl6fYMIvO-Pu6NEGcVEXsAYlzP_hg7YuVSl9Sv8AJevWVKEkr26RS1etfgk1_re5h2i_Bpe4W1nDEWro2THa25IcbKHnkWXAGR5ayU2clPdHf2JArngIi8f7ywpK4Z4NRV_ixMr3L3EfXF9pnVhAxdECqLCGIl_Asq3J4N5bYdUWoOIJmvTvx72BU9-w4CigIrmjWCSVIoZoYxhTMvSI4lRZGiV9IjjJhPA9zQH3iVgAFKNcecaTsdEEkJSip6g5m8_0GcLGCONnoVEyyhiVMM1VoAP4XgYMjGvTRrfrPk8XlZBGWhIQL07tAKV2gNJ6gNqoZbtzU7DuyfO_X1-gXVu5yhC8RM38o9BXABVyeV3ayDcz2bx2
link.rule.ids	315,783,787,799,27938,27939,55088
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED1BGYCBb0T59MCESInjxG1GVFEVaMrQIjEgRbZjSwhoK2hRxa_nLklRQQxsUWQ7ln3xvWffPQOcWmliJQz3grpFgmKU7zVkvtvkozN2sdCOkpOTrmzfhzcP0cMCnH_nwlhr8-AzW6PH_Cw_G5oJbZVdyIgjf4gWYSkiXFFka83W3VDIsFBHjRokwijLM0z86EX_stehMC5RQ7QheBj88EJz16rkXqW1DsmsP0UwyXNtMtY18_lLqvG_Hd6AtRJessvCHjZhwQ62YHVOdHAbHnvEgTN2R6KbLMlvkH5niF3ZHS4fr0-fWIo1KRh9yhIS8J-yFjq_3D6Zwoq9EZJhi0U-kGeT0bDmy4T0FrDiDty3rvrNtlfeseAZJCNjz4QNbQx33DoXhkbXfW6kMESkdMCV5JlSgW8lIj8VKwRjQhrf-Tp2liOWMmIXKoPhwO4Bc065IKs7oxtZKDT-6CayEbaXIQeT1lXhbDbm6aiQ0khzCuLHKU1QShOUlhNUhW0azu-C5Uju__36BJbb_aSTdq67twewQg0V-YKHUBm_TewRAoexPs7t5Qs_Zr_D
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Second+Order+Methods+for+Optimizing+Convex+Matrix+Functions+and+Sparse+Covariance+Clustering&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Chin%2C+Gillian+M.&rft.au=Nocedal%2C+Jorge&rft.au=Olsen%2C+Peder+A.&rft.au=Rennie%2C+Steven+J.&rft.date=2013-11-01&rft.pub=IEEE&rft.issn=1558-7916&rft.eissn=1558-7924&rft.volume=21&rft.issue=11&rft.spage=2244&rft.epage=2254&rft_id=info:doi/10.1109%2FTASL.2013.2263142&rft.externalDocID=6516015
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon