Stochastic Systems

Distributed user profiling via spectral methods

Dan-Cristian Tomozei and Laurent Massoulié

Full-text: Open access

Abstract

User profiling is a useful primitive for constructing personalised services, such as content recommendation. In the present paper we investigate the feasibility of user profiling in a distributed setting, with no central authority and only local information exchanges between users. We compute a profile vector for each user (i.e., a low-dimensional vector that characterises her taste) via spectral transformation of observed user-produced ratings for items. Our two main contributions follow:

(i) We consider a low-rank probabilistic model of user taste. More specifically, we consider that users and items are partitioned in a constant number of classes, such that users and items within the same class are statistically identical. We prove that without prior knowledge of the compositions of the classes, based solely on few random observed ratings (namely $O(N\log N)$ such ratings for $N$ users), we can predict user preference with high probability for unrated items by running a local vote among users with similar profile vectors. In addition, we provide empirical evaluations characterising the way in which spectral profiling performance depends on the dimension of the profile space. Such evaluations are performed on a data set of real user ratings provided by Netflix.

(ii) We develop distributed algorithms which provably achieve an embedding of users into a low-dimensional space, based on spectral transformation. These involve simple message passing among users, and provably converge to the desired embedding. Our method essentially relies on a novel combination of gossiping and the algorithm proposed by Oja and Karhunen.

Article information

Source
Stoch. Syst., Volume 4, Number 1 (2014), 1-43.

Dates
First available in Project Euclid: 18 September 2014

Permanent link to this document
https://projecteuclid.org/euclid.ssy/1411044991

Digital Object Identifier
doi:10.1214/11-SSY036

Mathematical Reviews number (MathSciNet)
MR3353213

Zentralblatt MATH identifier
1315.68024

Keywords
Spectral decomposition random matrix message passing distributed spectral embedding distributed recommendation system

Citation

Tomozei, Dan-Cristian; Massoulié, Laurent. Distributed user profiling via spectral methods. Stoch. Syst. 4 (2014), no. 1, 1--43. doi:10.1214/11-SSY036. https://projecteuclid.org/euclid.ssy/1411044991


Export citation

References

  • [1] Amatriain, X., Pujol, J., and Oliver, N., I like it… i like it not: Evaluating user ratings noise in recommender systems. Volume 5535 of Lecture Notes in Computer Science, pages 247–258. Springer Berlin / Heidelberg, 2009. 10.1007/978-3-642- 02247-0_24.
  • [2] Borkar, V. and Meyn, S. P., Oja’s algorithm for graph clustering and markov spectral decomposition. In ValueTools’08: Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools, pages 1–7, ICST, Brussels, Belgium, Belgium, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).
  • [3] Boyd, S., Ghosh, A., Prabhakar, B., and Shah, D., Randomized gossip algorithms. IEEE/ACM Transactions on Networking, 14(SI):2508–2530, 2006.
  • [4] Chaudhuri, K. and Rao, S., Learning Mixtures of Product Distributions Using Correlations and Independence. In R. A. Servedio and T. Zhang, editors, COLT, pages 9–20. Omnipress, 2008.
  • [5] Coja-Oghlan, A., A spectral heuristic for bisecting random graphs. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’05, pages 850–859, Philadelphia, PA, USA, 2005. Society for Industrial and Applied Mathematics.
  • [6] Dasgupta, A., Hopcroft, J., Kannan, R., and Mitra, P., Spectral clustering by recursive partitioning. In ESA’06: Proceedings of the 14th Conference on Annual European Symposium, pages 256–267, London, UK, 2006. Springer-Verlag.
  • [7] Dasgupta, A., Kannan, R., Hopcroft, J., and Mitra, P., Spectral Clustering with Limited Independence, 2005.
  • [8] Duflo, M., Méthodes récursives aléatoires. Techniques stochastiques. Masson, Paris, Milan, Barcelone, 1990.
  • [9] Feige, U. and Ofek, E., Spectral techniques applied to sparse random graphs. Random Struct. Algorithms, 27(2):251–275, 2005.
  • [10] Kempe, D. and McSherry, F., A decentralized algorithm for spectral analysis. In STOC, pages 561–568, 2004.
  • [11] Keshavan, R. H., Montanari, A., and Oh, S., Matrix completion from noisy entries. Journal of Machine Learning Research, 11:2057–2078, August 2010.
  • [12] Korada, S. B., Montanari, A., and Oh, S., Gossip PCA. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’11, pages 209–220, New York, NY, USA, 2011. ACM.
  • [13] Koren, Y., The BellKor Solution to the Netflix Grand Prize, 2009.
  • [14] Kumar, A. and Kannan, R., Clustering with Spectral Norm and the k-Means Algorithm. In FOCS, pages 299–308. IEEE Computer Society, 2010.
  • [15] McSherry, F., Spectral partitioning of random graphs. Proceedings FOCS, pages 529–537, 2001.
  • [16] Müller, A. and Stoyan, D., Comparison Methods for Stochastic Models and Risks. J. Wiley and Sons, 2002.
  • [17] Netflix, Netflix prize. http://www.netflixprize.com.
  • [18] Ng, A. Y., Jordan, M. I., and Weiss, Y., On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849–856. MIT Press, 2001.
  • [19] Oja, E. and Karhunen, J., On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Math. An. and App., 106(1), 1985.
  • [20] Shi, T., Belkin, M., and Yu, B., Data spectroscopy: Learning mixture models using eigenspaces of convolution operators. In ICML, pages 936–943, 2008.
  • [21] Srebro, N., Rennie, J. D. M., and Jaakola, T. S., Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems 17, pages 1329–1336. MIT Press, 2005.
  • [22] Stewart, G. W., On the early history of the singular value decomposition. SIAM Review, 35(4):551–566, 1993.
  • [23] Tomozei, D.-C. and Massoulié, L., Distributed user profiling via spectral methods. SIGMETRICS Perform. Eval. Rev., 38:383–384, June 2010.