Behnam Neyshabur1 and Rina Panigrahy2
arXiv:1311.3315v3 [cs.LG] 13 May 2014
1
Toyota Technological Institute at Chicago bneyshabur@ttic.edu 2
Microsoft Research rina@microsoft.com Abstract. We investigate the problem of factoring a matrix into several sparse matrices and propose an algorithm for this under randomness and sparsity assumptions. This problem can be viewed as a simplification of the deep learning problem where finding a factorization corresponds to finding edges in different layers and also values of hidden units. We prove that under certain assumptions on a sparse linear deep network with n nodes in each layer, our algorithm is able to recover the structure of the
˜ 1/6 ). network and values of top layer hidden units for depths up to O(n
We further discuss the relation among sparse matrix factorization, deep learning, sparse recovery and dictionary learning.
Keywords: Sparse Matrix Factorization, Dictionary Learning, Sparse
Encoding, Deep Learning
1
Introduction
In this paper we study the following matrix factorization problem. The sparsity π(X) of a matrix X is the number of non-zero entries in X.
Problem 1 (Sparse Matrix-Factorization). Given an input matrix Y factorize it is as Y = X1 X2 . . . Xs so as minimize the total sparsity si=1 π(Xi ).
The above problem is a simplification of the non-linear version of the problem that is directly related to learning using deep networks.
Problem 2 (Non-linear Sparse Matrix-Factorization). Given matrix Y , minimize si=1 π(Xi ) such that σ(X1 .σ(X2 .σ(. . . Xs ))) = Y where σ(x) is the sign function (+1 if x > 0, −1 if x < 0 and 0 otherwise) and σ applied on a matrix is simply applying the sign function on each entry. Here entries in Y are 0, ±1.
Connection to Deep Learning and Compression: The above problem is related to learning using deep networks (see [3]) that are generalizations of neural networks. They are layered network of nodes connected by edges between
successive
References: 3. Y. Bengio. Learning deep architectures for ai. Foundations and Trends in Machine Learning, 2009. 13. R. Salakhutdinov and G. E. Hinton. Deep boltzmann machines. Journal of Machine Learning Research, 5:448–455, 2009. 15. Li. Wan, Matthew. Zeiler, Sixin. Zhang, Yann. LeCun, and Rob. Fergus. Regularization of neural networks using dropconnect. ICML, 2013. 16. P. M. Wood. Universality and the circular law for sparse random matrices. The Annals of Applied Probability, 22(3):1266–1300, 2012.