Ruslan Salakhutdinov rsalakhu@cs.toronto.edu Andriy Mnih amnih@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada
Abstract
Low-rank matrix approximation methods provide one of the simplest and most effective approaches to collaborative filtering. Such models are usually fitted to data by finding a MAP estimate of the model parameters, a procedure that can be performed efficiently even on very large datasets. However, unless the regularization parameters are tuned carefully, this approach is prone to overfitting because it finds a single point estimate of the parameters. In this paper we present a fully Bayesian treatment of the Probabilistic
Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. We show that Bayesian
PMF models can be efficiently trained using Markov chain Monte Carlo methods by applying them to the Netflix dataset, which consists of over 100 million movie ratings.
The resulting models achieve significantly higher prediction accuracy than PMF models trained using MAP estimation.
& Jaakkola, 2003). Training such a model amounts to finding the best rank-D approximation to the observed
N × M target matrix R under the given loss function.
A variety of probabilistic factor-based models have been proposed (Hofmann, 1999; Marlin, 2004; Marlin
& Zemel, 2004; Salakhutdinov & Mnih, 2008). In these models factor variables are assumed to be marginally independent while rating variables are assumed to be conditionally independent given the factor variables.
The main drawback of such models is that inferring the posterior distribution over the factors given the ratings is intractable. Many of the existing methods resort to performing MAP estimation of the model parameters. Training such models amounts to maximizing
References: Hofmann, T. (1999). Probabilistic latent semantic analysis Lim, Y. J., & Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction Marlin, B. (2004). Modeling user rating profiles for collaborative filtering Marlin, B., & Zemel, R. S. (2004). The multiple multiplicative factor model for collaborative filtering. Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods (Technical Report CRG-TR-93-1) Nowlan, S. J., & Hinton, G. E. (1992). Simplifying neural networks by soft weight-sharing. Neural Computation, 4, 473–493. Raiko, T., Ilin, A., & Karhunen, J. (2007). Principal component analysis for large scale problems with lots of missing values Rennie, J. D. M., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. Machine Learning, Proceedings of the Twenty-Second International Conference (ICML Salakhutdinov, R., & Mnih, A. (2008). Probabilistic matrix factorization Srebro, N., & Jaakkola, T. (2003). Weighted low-rank approximations