Open Access
2007 A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations
Chris P. Lee, Gene H. Golub, Stefanos A. Zenios
Internet Math. 4(4): 299-328 (2007).

Abstract

The PageRank model pioneered by Google is the most common approach for generating web search results. We present a two-stage algorithm for computing the PageRank vector where the algorithm exploits the lumpability of the underlying Markov chain. We make three contributions. First, the algorithm speeds up the PageRank calculation significantly. With web graphs having millions of webpages, the speed-up is typically in the two- to three-fold range. The algorithm can also embed other acceleration methods such as quadratic extrapolation, the Gauss-Seidel method, or the Biconjugate gradient stable method for an even greater speed-up; cumulative speed-up is as high as 7 to 14 times. The second contribution relates to the handling of dangling nodes. Conventionally, dangling nodes are included only towards the end of the computation. While this approach works reasonably well, it can fail in extreme cases involving aggressive personalization. We prove that our algorithm is the generally correct way of handling dangling nodes using probabilistic arguments. We also discuss variants of our algorithm, including a multistage extension for calculating a generalized version of the PageRank model where different personalization vectors are used for webpages of different classes. The ability to form class associations may be useful for building more refined models of web traffic.

Citation

Download Citation

Chris P. Lee. Gene H. Golub. Stefanos A. Zenios. "A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations." Internet Math. 4 (4) 299 - 328, 2007.

Information

Published: 2007
First available in Project Euclid: 27 May 2009

zbMATH: 1206.68050
MathSciNet: MR2522947

Rights: Copyright © 2007 A K Peters, Ltd.

Vol.4 • No. 4 • 2007
Back to Top