Variation of Data
One simple way to measure the variation of a data set is its range.
Consider the set of values: .
The highest value of the data set is and the lowest is . So, the range of the data set is
But that doesn't tell the whole story. Sometimes, we are also interested in how clustered or spread out the data is.
Consider another set of data .
The two sets have almost the same range, but the distributions have different shapes.
If you draw a line plot of the two, it will look like this:
In the first data set, the data is clustered around the median, .
In the second data set, the data is more spread out, with a little cluster near the top of the range.
The median of the lower half of a set of data is the lower quartile (LQ) or .
The median of the upper half of a set of data is the upper quartile (UQ) or .
The upper and lower quartiles can be used to find another measure of variation call the interquartile range.
Interquartile range =
In the above example, the interquartile range is .
Data points that are more than times the value of the interquartile range beyond the quartiles are called outliers.