Variation of Data

One simple way to measure the variation of a data set is its range.

Example :

Consider the set of values: $10, 21, 34, 35, 36, 37, 37, 41, 44 and 67$ .

The highest value of the data set is $67$ and the lowest is $10$ . So, the range of the data set is

$67 - 10 = 57$

But that doesn't tell the whole story. Sometimes, we are also interested in how clustered or spread out the data is.

Consider another set of data $10, 15, 30, 40, 45, 55, 60, 65, 68, and 70$ .

The two sets have almost the same range, but the distributions have different shapes.

If you draw a line plot of the two, it will look like this:

Math diagram

In the first data set, the data is clustered around the median, $36.5$ .

In the second data set, the data is more spread out, with a little cluster near the top of the range.

In a set of data, the quartiles are the values that divide the data into four equal parts. The median of a set of data separates the set in half.

Math diagram

The median of the lower half of a set of data is the lower quartile (LQ) or $Q_{1}$ .

The median of the upper half of a set of data is the upper quartile (UQ) or $Q_{3}$ .

Here, $Q_{1} = 15$ and $Q_{3} = 35$

The upper and lower quartiles can be used to find another measure of variation call the interquartile range.

The interquartile range is the range of the middle half of a set of data. It is the difference between the upper quartile and the lower quartile.

Interquartile range = $Q_{3} - Q_{1}$

In the above example, the interquartile range is $35 - 15 = 20$ .

Data points that are more than $1.5$ times the value of the interquartile range beyond the quartiles are called outliers.