We have been using a bayesian average on a website of ours for a couple of years now and it seems to have working great but when using the same formula with other (good) data on another website the output was a bit disturbing.
We use this PHP code:
<?php
$avg_num_reviews = 6264.3636363636;
$avg_rating = 4.527272727272727;
$this_num_review = 4;
$this_average_rating = 4;
$bayesian_average = (($avg_num_reviews * $avg_rating) + ($this_num_reviews * $this_average_rating)) / ($avg_num_reviews + $this_num_reviews);
echo $bayesian_average;
?>
This normaly is filled in dynamicly but to give you an idea we will put in the data manualy.
So this data:
$avg_num_reviews = 6264.3636363636;
$avg_rating = 4.527272727272727;
is correct.
If we then calculate the Bayesian average, then in my opinion something is going wrong. It would help us enormously if someone could say something meaningful about this.
$this_num_reviews = total number over reviews for this company;
$this_average_rating = the average rating of this company;
Now we get to the weird part.:
Company A
$avg_num_reviews = 6264.3636363636;
$avg_rating = 4.527272727272727;
$this_num_review = 4;
$this_average_rating = 4;
$bayesian_average = (($avg_num_reviews * $avg_rating) + ($this_num_reviews * $this_average_rating)) / ($avg_num_reviews + $this_num_reviews);
$bayesian_average = 4.5269362613254;
Company B
$avg_num_reviews = 6264.3636363636;
$avg_rating = 4.527272727272727;
$this_num_review = 111;
$this_average_rating = 4.2;
$bayesian_average = (($avg_num_reviews * $avg_rating) + ($this_num_reviews * $this_average_rating)) / ($avg_num_reviews + $this_num_reviews);
$bayesian_average = 4.5215746565744;
Company C
$avg_num_reviews = 6264.3636363636;
$avg_rating = 4.527272727272727;
$this_num_review = 15468;
$this_average_rating = 4.4;
$bayesian_average = (($avg_num_reviews * $avg_rating) + ($this_num_reviews * $this_average_rating)) / ($avg_num_reviews + $this_num_reviews);
$bayesian_average = 4.4366864211353;
If we would sort this based on the bayesian average it will be like this:
- Company A
- Company B
- Company C
But how can it happen that a company (Company C) with 15468 reviews and an average of 4.4 has a lower Bayesian average than a company (Company A) with 4 reviews and an average of 4?
The Bayesian average should ensure that these kinds of things are prevented.
The logical sequence should be as follows:
- Company C
- Company B
- Company A
Can someone explain to me why this happens and whether there might be something wrong in the formula?
Am I missing something or should I use a different formula to do this better?
If anyone can say something meaningful about this, I would be very grateful to that person because I am a bit off track at the moment.