Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering
A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In...
Saved in:
Published in | IEEE transactions on audio, speech, and language processing Vol. 21; no. 11; pp. 2244 - 2254 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.11.2013
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ 1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model. |
---|---|
AbstractList | A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ 1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model. |
Author | Rennie, Steven J. Nocedal, Jorge Chin, Gillian M. Olsen, Peder A. |
Author_xml | – sequence: 1 givenname: Gillian M. surname: Chin fullname: Chin, Gillian M. email: gillian.chin@u.northwestern.edu organization: Dept. of Ind. Eng. & Manage. Sci., Northwestern Univ., Evanston, IL, USA – sequence: 2 givenname: Jorge surname: Nocedal fullname: Nocedal, Jorge email: nocedal@eecs.northwestern.edu organization: Dept. of Ind. Eng. & Manage. Sci., Northwestern Univ., Evanston, IL, USA – sequence: 3 givenname: Peder A. surname: Olsen fullname: Olsen, Peder A. email: pederao@us.ibm.com organization: T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA – sequence: 4 givenname: Steven J. surname: Rennie fullname: Rennie, Steven J. email: sjrennie@us.ibm.com organization: T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA |
BookMark | eNo9kMFOAjEURRuDiYB-gHHTHxjsazsdZkmIqAmEBbgzmXQ6r1oDLWkHgn69QyCs7l3ccxdnQHo-eCTkEdgIgJXP68lqPuIMxIhzJUDyG9KHPB9nRcll79pB3ZFBSj-MSaEk9MnnCk3wDV3GBiNdYPsdmkRtiHS5a93W_Tn_RafBH_BIF7qN7khne29aF3yiugNXOx0TdpODjk5709XNPrUYO_Ce3Fq9SfhwySH5mL2sp2_ZfPn6Pp3MM8OhaDMjx7UxYAGtldLUBQOjhJF8XNQctIJGa85QMZnrUgPLhTLMsrq0CFxJI4YEzr8mhpQi2moX3VbH3wpYddJTnfRUJz3VRU_HPJ0Zh4jXvcpBMcjFPwe1ZC4 |
CODEN | ITASD8 |
Cites_doi | 10.21437/Interspeech.2004-268 10.1109/ICASSP.2011.5947557 10.1098/rspa.1946.0056 10.1007/b98874 10.1007/978-3-642-30023-3_7 10.1145/1188455.1188575 10.7551/mitpress/8996.001.0001 10.1002/cpa.20042 10.1007/s10107-007-0170-0 10.1109/ICASSP.2003.1198840 10.1145/1143844.1143856 10.1109/ICASSP.2002.1005699 10.1137/080716542 10.1109/ICASSP.2008.4518545 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TASL.2013.2263142 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-7924 |
EndPage | 2254 |
ExternalDocumentID | 10_1109_TASL_2013_2263142 6516015 |
Genre | orig-research |
GroupedDBID | 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ ABQJQ ABVLG AETIX ALMA_UNASSIGNED_HOLDINGS B-7 BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RIG RNS AAYXX CITATION |
ID | FETCH-LOGICAL-c217t-c48bcc1f1eff44cb701c63c4287b21a61daa20e6045a9a10536c0f0b9fe1264c3 |
IEDL.DBID | RIE |
ISSN | 1558-7916 |
IngestDate | Fri Aug 23 03:24:28 EDT 2024 Wed Jun 26 19:27:46 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c217t-c48bcc1f1eff44cb701c63c4287b21a61daa20e6045a9a10536c0f0b9fe1264c3 |
PageCount | 11 |
ParticipantIDs | crossref_primary_10_1109_TASL_2013_2263142 ieee_primary_6516015 |
PublicationCentury | 2000 |
PublicationDate | 2013-11-01 |
PublicationDateYYYYMMDD | 2013-11-01 |
PublicationDate_xml | – month: 11 year: 2013 text: 2013-11-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | IEEE transactions on audio, speech, and language processing |
PublicationTitleAbbrev | TASL |
PublicationYear | 2013 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | el ghaoui (ref2) 1987; 2 olsen (ref21) 2004 ref12 ref15 ref20 minka (ref7) 2000 ref22 dasgupta (ref23) 2008 ref17 sra (ref14) 2011 ref16 li (ref11) 2005; 5 ref19 ref18 boyd (ref1) 1987; 15 hsieh (ref3) 2011 ref8 banerjee (ref10) 2008; 9 rish (ref13) 2011 ref9 ref6 ref5 olsen (ref4) 2012 |
References_xml | – volume: 15 year: 1987 ident: ref1 publication-title: Linear Matrix Inequalities In System and Control Theory contributor: fullname: boyd – start-page: 673 year: 2004 ident: ref21 article-title: Fast clustering of gaussians and the virtue of representing gaussians in exponential model format publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2004-268 contributor: fullname: olsen – ident: ref20 doi: 10.1109/ICASSP.2011.5947557 – ident: ref12 doi: 10.1098/rspa.1946.0056 – year: 2000 ident: ref7 article-title: Old and new matrix algebra useful for statistics contributor: fullname: minka – ident: ref15 doi: 10.1007/b98874 – year: 2011 ident: ref3 article-title: Sparse inverse covariance matrix estimation using quadratic approximation publication-title: NIPS contributor: fullname: hsieh – year: 2012 ident: ref4 article-title: Newton-like methods for sparse inverse covariance estimation publication-title: NIPS contributor: fullname: olsen – volume: 5 start-page: 669 year: 2005 ident: ref11 article-title: Optimal clustering and non-uniform allocation of gaussian kernels in scalar dimension for HMM compression publication-title: Proc ICASSP contributor: fullname: li – ident: ref5 doi: 10.1007/978-3-642-30023-3_7 – ident: ref8 doi: 10.1145/1188455.1188575 – year: 2011 ident: ref14 publication-title: Optimization for Machine Learning doi: 10.7551/mitpress/8996.001.0001 contributor: fullname: sra – ident: ref16 doi: 10.1002/cpa.20042 – ident: ref18 doi: 10.1007/s10107-007-0170-0 – ident: ref6 doi: 10.1109/ICASSP.2003.1198840 – ident: ref9 doi: 10.1145/1143844.1143856 – ident: ref22 doi: 10.1109/ICASSP.2002.1005699 – year: 2011 ident: ref13 publication-title: ELEN E6898 Sparse Signal Modeling (Spring 2011) Lecture 7 Beyond LASSO Other Losses (Likelihoods) contributor: fullname: rish – volume: 9 start-page: 485 year: 2008 ident: ref10 article-title: Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data publication-title: J Mach Learn Res contributor: fullname: banerjee – ident: ref17 doi: 10.1137/080716542 – year: 2008 ident: ref23 publication-title: ?The hardness of K-Means clustering ? contributor: fullname: dasgupta – ident: ref19 doi: 10.1109/ICASSP.2008.4518545 – volume: 2 year: 1987 ident: ref2 publication-title: Advances in Linear Matrix Inequality Methods in Control contributor: fullname: el ghaoui |
SSID | ssj0043641 |
Score | 2.1245193 |
Snippet | A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing... |
SourceID | crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 2244 |
SubjectTerms | clustering Convexity FISTA Hessian structure Jeffreys divergence Kullback Leibler divergence Large scale systems LASSO Newton's method Optimization Pattern recognition Sparse matrices |
Title | Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering |
URI | https://ieeexplore.ieee.org/document/6516015 |
Volume | 21 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJz34a4rzFzl4Ets1TZq2xzEcQ6w7bIMdhJKkCYi6DW1l7K_3pe3GFA_eSknakLzkfV_y3heEbjRXsaCKOH6ogaAo4TkRL3ebPHDGJqbS2OTk5IkPJuxhGkwb6G6TC6O1LoPPtGsfy7P8bK4Ku1XW4QEB_hDsoJ0wjqtcrfWqyyhnlTZqEFkJRl6fYMIvO-Pu6NEGcVEXsAYlzP_hg7YuVSl9Sv8AJevWVKEkr26RS1etfgk1_re5h2i_Bpe4W1nDEWro2THa25IcbKHnkWXAGR5ayU2clPdHf2JArngIi8f7ywpK4Z4NRV_ixMr3L3EfXF9pnVhAxdECqLCGIl_Asq3J4N5bYdUWoOIJmvTvx72BU9-w4CigIrmjWCSVIoZoYxhTMvSI4lRZGiV9IjjJhPA9zQH3iVgAFKNcecaTsdEEkJSip6g5m8_0GcLGCONnoVEyyhiVMM1VoAP4XgYMjGvTRrfrPk8XlZBGWhIQL07tAKV2gNJ6gNqoZbtzU7DuyfO_X1-gXVu5yhC8RM38o9BXABVyeV3ayDcz2bx2 |
link.rule.ids | 315,783,787,799,27938,27939,55088 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED1BGYCBb0T59MCESInjxG1GVFEVaMrQIjEgRbZjSwhoK2hRxa_nLklRQQxsUWQ7ln3xvWffPQOcWmliJQz3grpFgmKU7zVkvtvkozN2sdCOkpOTrmzfhzcP0cMCnH_nwlhr8-AzW6PH_Cw_G5oJbZVdyIgjf4gWYSkiXFFka83W3VDIsFBHjRokwijLM0z86EX_stehMC5RQ7QheBj88EJz16rkXqW1DsmsP0UwyXNtMtY18_lLqvG_Hd6AtRJessvCHjZhwQ62YHVOdHAbHnvEgTN2R6KbLMlvkH5niF3ZHS4fr0-fWIo1KRh9yhIS8J-yFjq_3D6Zwoq9EZJhi0U-kGeT0bDmy4T0FrDiDty3rvrNtlfeseAZJCNjz4QNbQx33DoXhkbXfW6kMESkdMCV5JlSgW8lIj8VKwRjQhrf-Tp2liOWMmIXKoPhwO4Bc065IKs7oxtZKDT-6CayEbaXIQeT1lXhbDbm6aiQ0khzCuLHKU1QShOUlhNUhW0azu-C5Uju__36BJbb_aSTdq67twewQg0V-YKHUBm_TewRAoexPs7t5Qs_Zr_D |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Second+Order+Methods+for+Optimizing+Convex+Matrix+Functions+and+Sparse+Covariance+Clustering&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Chin%2C+Gillian+M.&rft.au=Nocedal%2C+Jorge&rft.au=Olsen%2C+Peder+A.&rft.au=Rennie%2C+Steven+J.&rft.date=2013-11-01&rft.pub=IEEE&rft.issn=1558-7916&rft.eissn=1558-7924&rft.volume=21&rft.issue=11&rft.spage=2244&rft.epage=2254&rft_id=info:doi/10.1109%2FTASL.2013.2263142&rft.externalDocID=6516015 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon |