# Who's #1?: The Science of Rating and Ranking

Amy N. Langville
Carl D. Meyer
Pages: 266
https://www.jstor.org/stable/j.ctt7rwdt

1. Front Matter
(pp. i-vi)
(pp. vii-xii)
3. Preface
(pp. xiii-xviii)
4. Chapter One Introduction to Ranking
(pp. 1-8)

It was right around the turn of the millennium when we first became involved in the field of ranking items. The now seminal 1998 paperAnatomy of a Search Engineused Markov chains, our favorite mathematical tool, to rank webpages. Two little known graduate students had used Markov chains to improve search engine rankings and their method was so successful that it became the foundation for their fledgling company, which was soon to become the extremely well-known search engine Google. The more we read about ranking, the more involved in the field we became. Since those early days, we′ve written...

5. Chapter Two Massey’s Method
(pp. 9-20)

The Bowl Championship Series (BCS) is a rating system for NCAA college football that was designed to determine which teams are invited to play in which bowl games. The BCS has become famous, and perhaps notorious, for the ratings it generates for each team in the NCAA. These ratings are assembled from two sources, humans and computers. Human input comes from the opinions of coaches and media. Computer input comes from six computer and mathematical models—details are given in the aside on page 17. The BCS ratings for the 2001 and 2003 seasons are known as particularly controversial among...

6. Chapter Three Colley’s Method
(pp. 21-28)

In 2001, Dr. Wesley Colley, an astrophysicist by training, wrote a paper about his new method for ranking sports teams [22]. This side project of his became so successful that, like one of Massey’s models, it too is now incorporated in the BCS method of ranking NCAA college football teams. His method, which we call the Colley Rating Method, is a modification of one of the simplest and oldest rating systems, the rating system that uses winning percentage. Winning percentage rates teamiwith the valueriaccording to the rule

$r_i = \frac{{w_i }}{{t_i }},$

wherewiandtiare the number of...

7. Chapter Four Keener′s Method
(pp. 29-52)

James P. Keener proposed his rating method in a 1993 SIAM Review article [42]. Keener’s approach, like many others, utilizes nonnegative statistics that result from contests (games) between competitors (teams) to create a numerical rating for each team. In some circles these are calledpower ratings. Of course, once a numerical rating for each team is established, then ranking the teams in order of their ratings is a natural consequence.

Keener′s method is to relate theratingfor a given team to theabsolute strengthof the team, which in turn depends on therelative strengthof the team—i.e.,...

8. Chapter Five Elo’s System
(pp. 53-66)

Árpád Élö (1903–1992) was a Hungarian-born physics professor at Marquette University in Milwaukee, Wisconsin. In addition, he was an avid (and excellent) chess player, and this led him to create an effective method to rate and rank chess players. His system was approved by the United States Chess Federation (USCF) in 1960, and by Fédération Internationale des Échecs (the World Chess Federation, or FIDE) in 1970. Elo’s idea eventually became popular outside of the chess world, and it has been modified, extended, and adapted to rate other sports and competitive situations. The premise that Elo used was that each...

9. Chapter Six The Markov Method
(pp. 67-78)

This new method for ranking sports teams invokes an old technique from A. A. Markov, and thus, we call it the Markov method.¹ In 1906 Markov invented his chains, which were later labeled as Markov chains, to describe stochastic processes. While Markov first applied his technique to linguistically analyze the sequence of vowels and consonants in Pushkin’s poemEugene Onegin, his chains have found a plethora of applications since [8, 80]. Very recently graduate students of our respective universities, Anjela Govan (Ph.D. North Carolina State University, 2008) [34] and Luke Ingram (M.S., College of Charleston, 2007) [41] used Markov chains...

10. Chapter Seven The Offense–Defense Rating Method
(pp. 79-96)

A natural approach in the science of ratings is to first rate individual attributes of each team or participant, and then combine these to generate a single number that reflects overall strength. In particular, success in most contests requires a strong offense as well as a strong defense, so it makes sense to try to rate each separately before drawing conclusions about who is #1.

But this is easier said than done, especially with regard to rating offensive strength and defensive strength. The problem is, as everyone knows, playing against a team with an impotent defense can make even the...

11. Chapter Eight Ranking by Reordering Methods
(pp. 97-112)

The philosophy thus far in this book is depicted below in Figure 8.1. We start with input data about the items that we′d like to rank and then run an algorithm that produces a rating vector, which, in turn, produces a ranking vector. The focus has been on moving from

left to right in Figure 8.1, i.e., transforming input data into a ranking vector. However, working backwards and instead turning the focus to the final product, the ranking vector itself, we are able to generate some new ideas, two of which appear in this chapter. These two new ranking methods...

(pp. 113-126)

Something would be amiss in a book about ratings and rankings if there was not a discussion of point spreads because beating the spread is the Holy Grail for those in the betting world. However, the goals and objectives of scientific ratings systems are not generally in line with those of bookmakers or gamblers.

A good scientific rating system tries to accurately reflect relative differences in the overall strength of each team or competitor after a reasonable amount of competition has occurred. Perhaps the most basic goal is to provide reasonable rankings that agree with expert consensus and reflect long-term...

13. Chapter Ten User Preference Ratings
(pp. 127-134)

The “spread rating” ideas that were introduced in (9.4) on page 120 transcend sports ratings and rankings. A major issue that has arisen in recent years in the wake of online commerce is that of rating and ranking items by user preference. While they may not have been the first to employ user rating and ranking systems for product recommendation, companies such as Amazon.com and Netflix have developed highly refined (and proprietary) techniques for online marketing based on product recommendation systems. It would require another book to delve into all of the details surrounding these technologies, but we can nevertheless...

14. Chapter Eleven Handling Ties
(pp. 135-146)

Companies such as Amazon, Netflix, and eBay solicit and collect data on user behavior, which results in enormous databases with records of user ratings of products and services. One common goal of analyzing this data is to create rankings of items, which may then be used as part of a recommendation system. Typically, the first step in creating such a ranking is to transform the user ratings into pair-wise comparisons. There are several methods for doing this. See, for instance, [33] and the material in Chapter 10. Every transformation begins by creating head-to-head contests between pairs of items. For example,...

15. Chapter Twelve Incorporating Weights
(pp. 147-154)

Rearranging the past is what this chapter is about. In fact, this is the first chapter that is more art than science. The art of ranking includes the ability to customize a method based on expert or application-specific information. Here’s a scenario: Team A lost two early-season games, then went undefeated for the remainder of the season, while team B was undefeated save for two late-season games. In your opinion, which team should be ranked higher? Most people say team A, since “preseason doesn’t matter.” That intuition sparked the mathematics of this chapter.

In all of the models presented thus...

16. Chapter Thirteen “What If . . .” Scenarios and Sensitivity
(pp. 155-158)

It is common for coaches and fans to conduct “what if” analysis at certain crucial points in a season. For example, with just one game left in the season, a college football coach may wonder what happens to his team’s ranking if his team wins byβpoints in this final contest. Of course, one way to answer the coach’s question is to compute the ranking prior to the game of interest, then assume the coach’s desired outcome for this final game and recompute the ranking. However, for very large (think web-sized) applications, a full recomputation of the ranking for...

17. Chapter Fourteen Rank Aggregation–Part 1
(pp. 159-182)

The dictum “the whole is greater than the sum of its parts” is the idea behind rank aggregation, which is the focus of this chapter and the next. The aim is to somehow merge several ranked lists in order build a single new superior ranked list.

The need for aggregating several ranked lists into one “super” list is common and has many applications. Consider, for instance, meta-search engines such as Excite, Hotbot, and Clusty that combine the ranked results from the major search engines into one (hopefully) superior ranked list. Figure 14.1 gives a pictorial representation of the concept of...

18. Chapter Fifteen Rank Aggregation–Part 2
(pp. 183-200)

The rank-aggregation methods of the previous chapter areheuristicmethods, meaning that they come with no guarantees that the aggregated ranking is optimal. On the other hand, the great advantage of these heuristic methods is that they are fast, very fast compared to the optimization method of rank aggregation described in this chapter. Of course, the extra time required by this optimization method is often justified when accuracy is essential.

We now describe one optimal rank-aggregation method, which was created by Dr. Yoshitsugu Yamamoto of the University of Tsukuba in Japan [50]. This method produces an aggregated ranking that optimizes...

19. Chapter Sixteen Methods of Comparison
(pp. 201-216)

Chapters 2–8 presented a growing but finite set of methods for ranking items. When the weighting methods of Chapter 12 and the rank-aggregation methods of Chapters 14 and 15 are employed, the set of methods expands to a seemingly infinite set as the number of ways of combining these methods grows exponentially. Given this long list of possible methods for ranking items, a natural question arises: which ranking method (including the rank-aggregated methods) is best? In other words, how do we compare these methods for ranking items? As usual we begin our investigation by consulting the literature, which takes...

20. Chapter Seventeen Data
(pp. 217-222)

Every sports rating model in this book requires that data on certain game statistics be used as input to the model. Thanks to the Web, finding this data is not hard. However, entering it into a format that is friendly to computer algorithms is. At the heart of each rating model presented here is a matrix which, once built, is then analyzed. Even for tiny examples building this matrix by manual data entry quickly becomes tedious. Thus, one must either (a) create a tool such as a perl-scripted web scraper that automatically converts the data available on webpages into an...

21. Chapter Eighteen Epilogue
(pp. 223-230)

In writing this book we had to make some decisions. One decision we faced often was: should Method X appear in the book or not? Ultimately, we had to stop writing at some point, which meant omitting some interesting methods. As a partial remedy, we have written this epilogue. So while the following methods did not appear as chapters in the book, we recommend them for those readers who are eager to learn more and wish the book hadn′t ended just yet. However, there is always the hope of a second edition, so we welcome reader feedback and suggestions.

In...

22. Glossary
(pp. 231-234)
23. Bibliography
(pp. 235-240)
24. Index
(pp. 241-247)