Numerical Algorithms for Personalized Search in Self-organizing Information Networks

Numerical Algorithms for Personalized Search in Self-organizing Information Networks

Sep Kamvar
Copyright Date: 2010
Pages: 160
  • Cite this Item
  • Book Info
    Numerical Algorithms for Personalized Search in Self-organizing Information Networks
    Book Description:

    This book lays out the theoretical groundwork for personalized search and reputation management, both on the Web and in peer-to-peer and social networks. Representing much of the foundational research in this field, the book develops scalable algorithms that exploit the graphlike properties underlying personalized search and reputation management, and delves into realistic scenarios regarding Web-scale data.

    Sep Kamvar focuses on eigenvector-based techniques in Web search, introducing a personalized variant of Google's PageRank algorithm, and he outlines algorithms--such as the now-famous quadratic extrapolation technique--that speed up computation, making personalized PageRank feasible. Kamvar suggests that Power Method-related techniques ultimately should be the basis for improving the PageRank algorithm, and he presents algorithms that exploit the convergence behavior of individual components of the PageRank vector. Kamvar then extends the ideas of reputation management and personalized search to distributed networks like peer-to-peer and social networks. He highlights locality and computational considerations related to the structure of the network, and considers such unique issues as malicious peers. He describes the EigenTrust algorithm and applies various PageRank concepts to P2P settings. Discussion chapters summarizing results conclude the book's two main sections.

    Clear and thorough, this book provides an authoritative look at central innovations in search for all of those interested in the subject.

    eISBN: 978-1-4008-3706-9
    Subjects: Technology, Mathematics

Table of Contents

  1. Front Matter
    (pp. i-iv)
  2. Table of Contents
    (pp. v-viii)
  3. Tables
    (pp. ix-x)
  4. Figures
    (pp. xi-xiv)
  5. Acknowledgments
    (pp. xv-xvi)
  6. Chapter One Introduction
    (pp. 1-4)

    Distributed, self-organizing networks such as the World Wide Web and peer-topeer networks allow for fast access to vast quantities of diverse information for a large number of users. However, with such large scale and data diversity comes the challenge of finding relevant data from reputable sources in an efficient manner.

    This book, addresses the issues of relevance and reputation by exploiting user preference information to perform reputation management and personalized search. The issues of personalization and reputation management are highly intertwined, in terms of both the basic ideas and the underlying technologies. Personalization exploits the preferences of an individual to...

    • Chapter Two PageRank
      (pp. 7-14)

      The PageRank algorithm for determining the reputation of Web pages has become a central technique in Web search [56]. The core of the PageRank algorithm involves computing the principal eigenvector of the Markov matrix representing the hyperlink structure of the Web. As the Web graph is very large, containing several billion nodes, the PageRank vector is generally computed offline, during the preprocessing of the Web crawl, before any queries have been issued. As discussed in Chapter 1, personalization requires significant advances to the standard PageRank algorithm.

      This chapter reviews the standard PageRank algorithm [56] and some of the mathematical tools...

    • Chapter Three The Second Eigenvalue of the Google Matrix
      (pp. 15-19)

      Before attempting to accelerate the computation of PageRank, it is useful first to prove some results regarding the convergence rate of the standard PageRank algorithm.

      In this chapter, we determine analytically the modulus of the second eigenvalue for the Web hyperlink matrix used by Google for computing PageRank.

      This has implications for the convergence rate of the standard PageRank algorithm as the Web scales, for the stability of PageRank to perturbations to the link structure of the Web, for the detection of Google spammers, and for the design of algorithms to speed up PageRank.

      For the purposes of accelerating PageRank,...

    • Chapter Four The Condition Number of the PageRank Problem
      (pp. 20-22)

      In the previous chapter, we showed convergence properties of the PageRank problem. In this chapter, we focus on stability. In particular, the following shows that the PageRank problem is well-conditioned for values ofcthat are not very close to 1.

      Theorem 6. Let P be an$n\; \times \;n$row-stochastic matrix whose diagonal elements${P_{ii}} = 0$. Let c be a real number such that$0\; \le \;c\; \le \;1$. LetEbe the$n\; \times \;n$rank-one row-stochastic matrix$E = \vec e{{\vec v}^T}$, where${\vec e}$is the n-vector whose elements are all${e_i} = 1$, and${\vec v}$is ann-vector that represents a probability distribution.1

      Define the matrix$A = {[c\,P + (1 - c)E]^T}$. The problem$A\vec x = \vec x$has...

    • Chapter Five Extrapolation Algorithms
      (pp. 23-41)

      The standard PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrixArepresenting the Web link graph. Since it was shown in the previous chapter that the Power Method converges quickly, one approach to accelerating PageRank would be to directly modify the Power Method to exploit our knowledge about the matrixA.

      In this chapter, we present several algorithms that accelerate the convergence of PageRank by using successive iterates of the Power Method to estimate the nonprincipal eigenvectors ofA, and periodically subtracting these estimates from the current iterate...

    • Chapter Six Adaptive PageRank
      (pp. 42-50)

      In the previous chapter, we exploited certain properties about the matrixA(namely, its known eigenvalues) to accelerate PageRank. In this chapter, we exploit certain properties of the convergence of the Power Method on the Web matrix to accelerate PageRank. Namely, we make the following simple observation: theconvergencerates of the PageRank values of individual pages during application of the Power Method are nonuniform.¹ That is, many pages converge quickly, with a few pages taking much longer. Furthermore, the pages that converge slowly are generally those pages with high PageRank.

      We devise a simple algorithm that exploits this observation...

    • Chapter Seven BlockRank
      (pp. 51-72)

      In this chapter, we exploit yet another observation about theWeb matrix A to speed up PageRank. We observe that the Web link graph has a nested block structure; that is, the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by BlockRank, a 3-stage algorithm whereby (1) the local PageRanks of pages for each host are computed independently using the link structure of that host, (2)...

    • Chapter Eight Query-Cycle Simulator
      (pp. 75-83)

      Due to the decentralized nature and fast growth of today’s P2P networks, testing P2P algorithms in a real-world environment by simply deploying them on an existing P2P network and collecting data on their performance is a daunting task. In some cases, measurements are easier to carry out due to some easily accessible central control entity in the network that manages node joins and departures [71]. Also, some algorithms may be tested by deploying them on one or a few controlled nodes in the network (as in [74]). However, for a wide range of P2P-related algorithms and protocols, simply deploying and...

    • Chapter Nine EigenTrust
      (pp. 84-107)

      In this chapter, we address the issue of reputation in P2P file-sharing networks. The open and anonymous nature of these networks leads to a complete lack of accountability for the content a peer puts on the network, opening the door to abuses of these networks by malicious peers. Like the web, P2P networks achieve scalability and diversity of data at the expense of accountability for the quality of content.

      Attacks by anonymous malicious peers have been observed on today’s popular peer-to-peer networks. For example, malicious users have used these networks to introduce viruses such as the VBS.Gnutella worm, which spreads...

    • Chapter Ten Adaptive P2P Topologies
      (pp. 108-132)

      In Web search, the approach we took to personalization was to bias the ranking vectors based on an individual’s interest. In P2P networks, the approach we take to personalization is todirectly connect each peer to peers with similar interests. There are several reasons for this approach. First, in Gnutella-style P2P networks, due to limited time-to-live (TTL) for queries, the quality of a user’s search experience is highly dependent on his or her peer’s local neighborhood. If a user’s peer is a few hops away from peers that carry the content that interest her, then she will be more able...

    • Chapter Eleven Conclusion
      (pp. 133-134)

      As large quantities of data are becoming more accessible via the WWW and P2P file-sharing networks, search is beginning to play a vital role in today’s society. As such, it is important to continue to improve the quality of a user’s search experience by identifying and intelligently exploiting “hidden” information (or signals) in these databases. One such signal isuser context information.

      This book examined the scalable use of user context information to improve search quality in both the WWW and P2P networks. We have presented scalable algorithms and detailed mathematical analysis for personalized search in both domains.

      There are...

  9. Bibliography
    (pp. 135-139)