How to sort rated content

If you are a web developer, or at least a webmaster, and if you have rateable content on your website, you probably almost once faced this common problem: how should I sort my content, using the collected rating data?

As a common problem, it has common but different solutions, most of which are wrong. We will analyse two cases: the up/down rating, and the 5-star rating.

Up/Down rating case

This is the simplest case. Everyone can positively or negatively rate the content. So, every content will have:

  • Np = number of positive ratings
  • Nn = number of negative ratings
Solution #1 (Definitely wrong)

The first attempt is: let’s use the raw difference between positive and negative ratings.

rating = Np - Nn

This is a fairly used approach, but still it is wrong. Suppose you have two items:

  1. Item 1 has 500 positive ratings and 200 negative ratings
  2. Item 2 has 150 positive ratings and 10 negative ratings

Just analysing the data you come to think that Item 2 is more appreciated by the public. But this algorithm will compute:

  1. Item 1 rating = 300
  2. Item 2 rating = 140

So, Item 1 is valued better than Item 2. This is not what we expected.

Solution #2 (Quite wrong)

This solution is widespread through the web. It consists on computing the arithmetic average score for each item.

score = Np / (Np + Nn) = Am (Arithmetic mean)

If we consider the same previous situation, we’ll have:

  1. Item 1 score = 500/700 = 71,43%
  2. Item 2 score = 150/160 = 93,75%

This result seems a little bit fairer. But let’s face this other situation:

  1. Item 1: Np = 90, Nn = 10, score = 90%
  2. Item 2: Np = 9, Nn = 1, score = 90%

The score is the same for the two items, even if we probably all think that the first item has a more precise valuation that the second.

Sadly, even with this second solution the results are too biased to be acceptable.

Solution #3 (best?)

If we wanted to computed a completely unbiased scoring, taking into account just the item’s ratings, we have to face some probabilistic mathematical aspect for the question.

What we are trying to do is the following: we want to compute the best estimation of the item’s real score considering that we can have a variable incertainty for the measures. The incertainty decreases when the number of ratings grows. Each rating contributes to the definition of the real score of the item, which is our unknown value, and with each new rating the estimated score gets closer to the real score.

The fundamental question is: “Given the amount of data that I have, what is the least scoring for the item that has a minimum 95% probability of being the real one?

So, welcome the Lower bound of Wilson score confidence interval for a Bernoulli parameter, that is to say:

The lower bound

The Wilson score confidence interval

In this formula, we have:

  • s = estimate of the real scoring
  • p = Np/(Np+Nn) = fraction of the positive ratings
  • z = z1-α/2 = the (1 – α/2) percentile of the standard normal distribution
  • n = Np + Nn = number of samples (ratings)

To get the lower bound for the estimate, we have to use the minus sign before the square root. If we want to get the 95 percentile, z will be equal to 1.96.

This formula has good results even for a very low number of available ratings, as well for a largely rated item. Its implementation is very simple too. A Ruby method implementation could be this one (warning: it requires the abscondment-statistics2 gem):

require 'statistics2'

def lower_bound_estimate(np, n, power)
    if n == 0 { return 0 }

    z = Statistics2.pnormaldist(1-power/2)
    p = (np/n).to_f
    (p + z*z/(2*n) - z * Math.sqrt((p*(1-p)+z*z/(4*n))/n))/(1+z*z/n)
end

 

5 star rating – General case

The previous 3 solutions are useful for up/down ratings, but poorely adapt for the more common 5 star rating approach. In this latter case we should find a solution that doesn’t cost too much in terms of computational time, is not too biased, and is rating-scale independent. That is, we have to find something that could adapt to the up/down case as well as to the 5 star case.

Solution #4 (a fair good compromise)

This solution is one currently used by IMDB to sort the top 250 rated titles. It is a Bayesian estimate of the score of the item. We have to define:

  • Am = arithmetic mean for the item
  • N = total number of votes
  • m = minimum number of votes for the item to be taken into account
  • ATm = arithmetic total mean when considering the collection of all the items

The Am parameter, for the simplest case, has to be computed as shown for Solution #2. For the 5 star case, we will use this formula:

Am = Σ(ratings) / N

Eg. if an item has 3 scores, let’s say 1, 4 and 5, Am will be: (1+4+5) / 3 = 3.33.

Using these parameters we define the weigthed scoring as follows:

Ws = (N / (N + m)) × Am + (m ÷ (N + m)) × ATm

We are answering this question: “What is the score of the item, given all the ratings I collected till now, for this item and for the others?” (= posterior expected value)

How does this work? Well, if we put m = 0 we will have Ws = Am, that’s what we just analysed in Solution #2. If we put m >> N or N ~= 0, we have Ws = ATm, that’s to say that every item’s scoring is equal to the global mean score.

Using a fair value for m, and that surely depends on the average number of rating per item (IMDB currently uses something around 3000), we will have that every item’s score will be biased around the global mean rating. Items that have few rating will have a weigthed rating very close to ATm, while items with lots or rating will tend to have Ws ~= Am.

This is actually an acceptable solution, since items with low number of rating will have a scoring that is coherent with the whole collection rating, avoiding the situation described in the last paragraph. But, still, this is a biased solution. The best amongst the biased solutions, I’d say.

Conclusions

Every problem has its own best solution. If you have the simplest up/down rating implementation, probably Solution #3 will be your best choise. For every other case I’d suggest to use an implementation of Solution #4, since it’s the one that performs best for various rating systems, and for low or high number of ratings.

2.019 thoughts on “How to sort rated content

  1. Hello! I could have sworn I’ve been to this blog before but after browsing through some of the post I realized it’s new to me. Anyways, I’m definitely happy I found it and I’ll be book-marking and checking back frequently!

  2. Hello! I could have sworn I’ve been to this blog before but after browsing through some of the post I realized it’s new to me. Anyways, I’m definitely happy I found it and I’ll be book-marking and checking back frequently!

  3. You actually make it seem so easy along with your presentation however I to find this matter to be really something that I think I would never understand. It kind of feels too complicated and extremely large for me. I’m looking forward to your next post, I’ll attempt to get the dangle of it!

  4. Pingback: URL
  5. Thanks for ones marvelous posting! I truly enjoyed reading it, you will be a great author.I will ensure that I bookmark your blog and definitely will come back from now on. I want to encourage you to definitely continue your great writing, have a nice holiday weekend!

  6. I am just commenting to let you know what a perfect experience my girl undergone reading through your blog. She learned so many issues, including how it is like to have a great giving style to let the others without difficulty learn about selected multifaceted issues. You actually surpassed people’s expected results. Thank you for supplying these warm and helpful, dependable, revealing and even unique thoughts on the topic to Sandra.

  7. Thank you so much for giving everyone an extremely wonderful opportunity to read in detail from this blog. It is usually so good plus stuffed with a great time for me personally and my office acquaintances to search your web site at a minimum thrice weekly to study the new items you will have. And of course, I am also at all times amazed considering the awesome hints you serve. Certain two tips in this post are ultimately the most efficient we’ve ever had.

  8. Have you ever considered about including a little bit more than just your articles? I mean, what you say is fundamental and everything. However think of if you added some great pictures or videos to give your posts more, “pop”! Your content is excellent but with images and video clips, this website could certainly be one of the most beneficial in its niche. Great blog!

  9. Thank you so much for giving everyone a very remarkable possiblity to discover important secrets from this site. It is usually so terrific plus stuffed with a lot of fun for me and my office mates to search your site at the least 3 times in 7 days to study the new tips you have got. And lastly, I am just at all times contented with your terrific hints served by you. Some 2 facts in this article are easily the finest we have all had.

  10. I and my guys happened to be examining the great tricks located on your web site while all of a sudden developed a terrible suspicion I had not thanked you for them. These boys happened to be consequently glad to read through all of them and now have very much been using these things. Appreciation for really being very kind and also for finding some excellent information most people are really wanting to understand about. My very own sincere regret for not saying thanks to earlier.

  11. Thanks a lot for giving everyone an exceptionally marvellous possiblity to discover important secrets from this website. It’s usually very terrific and stuffed with a great time for me and my office colleagues to search your web site more than three times in 7 days to study the newest tips you have. And definitely, I am just always fulfilled with the sensational creative concepts served by you. Selected 3 ideas in this article are completely the very best we have all had.

  12. Hello! I could have sworn I’ve been to this blog before but after browsing through some of the post I realized it’s new to me. Anyways, I’m definitely happy I found it and I’ll be book-marking and checking back frequently!

  13. This is really interesting, You are a very skilled blogger. I have joined your rss feed and look forward to seeking more of your magnificent post. Also, I’ve shared your web site in my social networks!

  14. Hello there! This article could not be written any
    better! Going through this article reminds me of my previous roommate!
    He continually kept talking about this. I will forward this article to him.
    Pretty sure he’s going to have a great read. Many thanks for
    sharing!

  15. Thank you for sharing superb informations. Your site is so cool. I’m impressed by the details that you have on this web site. It reveals how nicely you understand this subject. Bookmarked this website page, will come back for more articles. You, my pal, ROCK! I found just the information I already searched all over the place and just couldn’t come across. What a great website.

  16. This design is spectacular! You most certainly know how to keep a reader amused. Between your wit and your videos, I was almost moved to start my own blog (well, almost…HaHa!) Wonderful job. I really loved what you had to say, and more than that, how you presented it. Too cool!

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *