Fun with Average

You are probably familar with the term “average”, but do you know that average means two things: mean and median?

Mean: arithmetic average obtained by summing up all the values of samples and divide them by the number of samples

Median: The median is the value separating the higher half of a data sample from the lower half. The “middle” point.

The picture below shows the difference between mean and median. (Source: http://slideplayer.com/slide/5005509/)

picture21312121

The inappropriate uses of mean or median may distort the results.

Mean can be misleading. You see in the chart that Michael Jordan boosted the 1993 mean salary up a lot. He’s an outlier. When calculating means, outliers matter. Because by calculating means you have to sum up values of samples, so one outlier is capable of increasing or decreasing average by affecting the numerator. In this case, the outlier Micheal Jordan pulled the average to the right by a lot. 

y

(Source: http://blog.minitab.com/blog/michelle-paret/using-the-mean-its-not-always-a-slam-dunk)

So when you hear something like ” The average life expectancy of xxx is 46″, do not assume that people live in xxx live up to 46. It can be ascribed to the early death of some people or even infant morality, which will make average life expectancy drop by lot. Now median is more used in describing incomes or life expectancy. 

Mean can be misleading in other ways. Take this example from How to Lie with Statistics:

In 1949, Russell Sage Foundation claimed that the average income of an american family is $5004 (about 49110 in 2016). It seems a legit number until you find out how they calculated the “average” income. They obtained the number of total income of American people, which is 149000000 and divided by population. The result is 1251 each person (about 12277 in 2016). Here you have probably spotted a possible flaw: this number can be affected by extreme outliers who made significantly more. Also, there are children and senior citizens do not work. 1251 does not reflect average personal income by any chances. But that’s not all. What they did next is simply multiply this number by 4, which represents a family of four. First, there is no evidence other than their intuition that an american family is a family of four. There are families with one child, three children or more, or single families. Second, young children apparently do not work. Multiplying average personal income by four means all four people in the family work, which is not true for most families. So 5004 is nowhere near an accurate “average” income.

But in situations like “describing the spending power of the community”, median is not as good as mean. Because in this case we are looking at the power of spending, the outliers who spent a significant more does matter. Let’s say there are 50 people in this community and 49 of them spend from 20k to 30k on their cars. The one left, John, spend 200k on a brand new Bentley, will boost the spending power of all 50 people. If we want to know the average money spent on cars, then median is better because John will not affect the median as long as he spends the most. But his lavished spending habit is part of the spending power of his community. 

Interested in learning more about mean or median? Click Here.