The Global Optimization Geometry of Low-Rank Matrix Optimization

This paper considers general rank-constrained optimization problems that minimize a general objective function <inline-formula> <tex-math notation="LaTeX">{f}( {X}) </tex-math></inline-formula> over the set of rectangular <inline-formula> <tex-math notation...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on information theory Vol. 67; no. 2; pp. 1308 - 1331
Main Authors	Zhu, Zhihui, Li, Qiuwei, Tang, Gongguo, Wakin, Michael B.
Format	Journal Article
Language	English
Published	New York IEEE 01.02.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Constraints Convergence Convexity Factorization Geometry Global optimization Iterative methods Linear programming Low-rank optimization Matrix decomposition matrix factorization matrix sensing Minimization nonconvex optimization Optimization optimization geometry Parameterization Polynomials Robustness Search algorithms Sensors Signal processing algorithms Smoothness
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper considers general rank-constrained optimization problems that minimize a general objective function <inline-formula> <tex-math notation="LaTeX">{f}( {X}) </tex-math></inline-formula> over the set of rectangular <inline-formula> <tex-math notation="LaTeX">{n}\times {m} </tex-math></inline-formula> matrices that have rank at most r. To tackle the rank constraint and also to reduce the computational burden, we factorize <inline-formula> <tex-math notation="LaTeX"> {X} </tex-math></inline-formula> into <inline-formula> <tex-math notation="LaTeX"> {U} {V} ^{\mathrm {T}} </tex-math></inline-formula> where <inline-formula> <tex-math notation="LaTeX"> {U} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX"> {V} </tex-math></inline-formula> are <inline-formula> <tex-math notation="LaTeX">{n}\times {r} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">{m}\times {r} </tex-math></inline-formula> matrices, respectively, and then optimize over the small matrices <inline-formula> <tex-math notation="LaTeX"> {U} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX"> {V} </tex-math></inline-formula>. We characterize the global optimization geometry of the nonconvex factored problem and show that the corresponding objective function satisfies the robust strict saddle property as long as the original objective function f satisfies restricted strong convexity and smoothness properties, ensuring global convergence of many local search algorithms (such as noisy gradient descent) in polynomial time for solving the factored problem. We also provide a comprehensive analysis for the optimization geometry of a matrix factorization problem where we aim to find <inline-formula> <tex-math notation="LaTeX">{n}\times {r} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">{m}\times {r} </tex-math></inline-formula> matrices <inline-formula> <tex-math notation="LaTeX"> {U} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX"> {V} </tex-math></inline-formula> such that <inline-formula> <tex-math notation="LaTeX"> {U} {V} ^{\mathrm {T}} </tex-math></inline-formula> approximates a given matrix <inline-formula> <tex-math notation="LaTeX"> {X}^\star </tex-math></inline-formula>. Aside from the robust strict saddle property, we show that the objective function of the matrix factorization problem has no spurious local minima and obeys the strict saddle property not only for the exact-parameterization case where <inline-formula> <tex-math notation="LaTeX">\mathrm {rank}( {X}^\star) = {r} </tex-math></inline-formula>, but also for the over-parameterization case where <inline-formula> <tex-math notation="LaTeX">\mathrm {rank}( {X}^\star) < {r} </tex-math></inline-formula> and the under-parameterization case where <inline-formula> <tex-math notation="LaTeX">\mathrm {rank}( {X}^\star) > {r} </tex-math></inline-formula>. These geometric properties imply that a number of iterative optimization algorithms (such as gradient descent) converge to a global solution with random initialization.
ISSN:	0018-9448 1557-9654
DOI:	10.1109/TIT.2021.3049171