JenniferMurphy
Wellknown Member
 Joined
 Jul 23, 2011
 Messages
 2,293
 Office Version

 365
 Platform

 Windows
The product ratings on Amazon can be helpful, but they can also be misleading. For anyone unfamiliar with the Amazon rating system, it works like this. Any Amazon customer can submit a rating on any product on a scale of 1 to 5 stars. Amazon then displays an average rating for each product. If, for example, a product has 3 5star ratings, 2 3star ratings, and 1 1star rating (3 0 2 0 1), it's overall (average) rating would be 3.67 (= (3*5 + 2*3 + 1*1) / 6 = 22 / 6).
It is common for a product with just a few ratings to have a higher rating than a similar product with 10x or even 100x as many ratings. Clearly, if the ratings are equal, say 4.5, and one has just 5 ratings and the other has 500, I will have much more confidence in the one with 500 ratings. But what if the one with 5 ratings has an average rating of 4.8 and the one with 500 has an average rating of 4.6. Now which one should I feel more confident about? One has a higher average rating, but it is based on 1/100th of the data.
There ought to be a statistical measure that will help me compare them and decide.
Here is some sample data that I have made up for 4 hypothetical products. Amazon doesn't show the actual number of ratings in each category, just the percentages. So I have 2 rows for each of the 4 products. The first row contains the hypothetical counts of the actual ratings. The second row shows the percentages, which is what Amazon displays.
Column D has the product name (A, B, C, D). Columns EI contain the actual tallies. Col J has the number of ratings, Col K has the average rating, and Col L has the Amazon rating (which is rounded to 1 decimal place). Col M has the Statistical Rating that I would like help calculating.
Product A, for example, is on rows 4 & 5. It only has 3 ratings so far and they are all 5s (3 0 0 0 0). Possibly this is a brand new product and these ratings are from three members of the manufacturer's family. In any case, the mean is 5.00 (K4) and the rating, which Amazon rounds to 1 decimal place, is 5.0 (L4).
Product B is a little more interesting. It has 30 ratings  28 5s, 1 4, and 1 1  for a mean of 4.83 (K6), which Amazon rounds to 4.8 (L6). Products C has 10 times as many ratings as B and D 10 times as many as C. Since the manufacturer has a limited number of family members, the average ratings are dropping.
<tbody>
</tbody>
I would like help calculating some measure of the ratings of these products that I can use to determine which one really is the highest rated. I was thinking that some sort of confidence interval on the mean might work. I could take the lower bound or perhaps the value that is 20% from the lower to the upper bound or something like that.
Can someone help me with the correct statistical tool to compare them?
It is common for a product with just a few ratings to have a higher rating than a similar product with 10x or even 100x as many ratings. Clearly, if the ratings are equal, say 4.5, and one has just 5 ratings and the other has 500, I will have much more confidence in the one with 500 ratings. But what if the one with 5 ratings has an average rating of 4.8 and the one with 500 has an average rating of 4.6. Now which one should I feel more confident about? One has a higher average rating, but it is based on 1/100th of the data.
There ought to be a statistical measure that will help me compare them and decide.
Here is some sample data that I have made up for 4 hypothetical products. Amazon doesn't show the actual number of ratings in each category, just the percentages. So I have 2 rows for each of the 4 products. The first row contains the hypothetical counts of the actual ratings. The second row shows the percentages, which is what Amazon displays.
Column D has the product name (A, B, C, D). Columns EI contain the actual tallies. Col J has the number of ratings, Col K has the average rating, and Col L has the Amazon rating (which is rounded to 1 decimal place). Col M has the Statistical Rating that I would like help calculating.
Product A, for example, is on rows 4 & 5. It only has 3 ratings so far and they are all 5s (3 0 0 0 0). Possibly this is a brand new product and these ratings are from three members of the manufacturer's family. In any case, the mean is 5.00 (K4) and the rating, which Amazon rounds to 1 decimal place, is 5.0 (L4).
Product B is a little more interesting. It has 30 ratings  28 5s, 1 4, and 1 1  for a mean of 4.83 (K6), which Amazon rounds to 4.8 (L6). Products C has 10 times as many ratings as B and D 10 times as many as C. Since the manufacturer has a limited number of family members, the average ratings are dropping.
C/R  D  E  F  G  H  I  J  K  L  M 
3  Product  5  4  3  2  1  #  Mean  ARtg  SRtg 
4  A  3  0  0  0  0  3  5.00  5.0  ??? 
5  100%  0%  0%  0%  0%  
6  B  28  1  0  0  1  30  4.83  4.8  ??? 
7  93%  6%  1%  0%  3%  
8  C  240  37  8  5  10  300  4.64  4.6  ??? 
9  80%  12%  3%  2%  3%  
10  D  2,010  650  120  80  140  3,000  4.44  4.4  ??? 
11  67%  22%  4%  3%  5% 
<tbody>
</tbody>
I would like help calculating some measure of the ratings of these products that I can use to determine which one really is the highest rated. I was thinking that some sort of confidence interval on the mean might work. I could take the lower bound or perhaps the value that is 20% from the lower to the upper bound or something like that.
Can someone help me with the correct statistical tool to compare them?