Interpreting Data Distributions and Outliers
Help Questions
Statistics › Interpreting Data Distributions and Outliers
Based on the dot plot of reaction times (in seconds) for 15 students, which statement is best supported?
Dot plot (each dot is one student):
0.20: •
0.21: ••
0.22: •••
0.23: •••
0.24: ••
0.25: •
0.26: •
0.60: •
The distribution is roughly symmetric with a typical value near 0.40 seconds.
There is no outlier; the spread is small and the mean and median should be about the same as if 0.60 were not present.
The distribution is right-skewed with a typical value near 0.23 seconds, and the 0.60-second value pulls the mean to the right more than it changes the median.
The distribution is left-skewed with a typical value near 0.60 seconds.
Explanation
This question focuses on interpreting the shape, center, spread, and outliers from a dot plot of reaction times. The distribution is right-skewed, with most dots clustered between 0.20 and 0.26 seconds and a long tail to the right from the outlier at 0.60 seconds. The typical value is around 0.23 seconds, best captured by the median due to the skew. The outlier at 0.60 pulls the mean to the right more than it affects the median, as the mean averages all values while the median focuses on the middle. Choice C is supported because the dot plot visually shows a dense cluster near 0.23 with one isolated dot at 0.60, highlighting the skew and the outlier's effect on the mean. A misconception is reversing skew direction, thinking a high outlier makes it left-skewed, but the tail points right. Start by identifying tails or outliers in the display, then choose the median for typical values in skewed distributions.
A runner recorded the pace (minutes per mile) for 14 training miles: 8.2, 8.3, 8.3, 8.4, 8.4, 8.5, 8.5, 8.6, 8.6, 8.7, 8.7, 8.8, 8.9, 11.5. The 11.5-minute pace occurred on a day with strong headwinds. How does this outlier most likely affect the mean and the median pace?
It increases the mean more than the median because the mean is pulled toward the outlier.
It increases the median more than the mean because the median uses the extreme value directly.
It does not affect either measure because only the spread changes with an outlier.
It decreases the mean and increases the median because the outlier is larger than the rest.
Explanation
This involves interpreting how an outlier affects the center in a skewed distribution of running paces. The paces form a right-skewed distribution, with most values tightly clustered from 8.2 to 8.9 and a tail to 11.5 due to headwinds. Typical pace is best as the median, resistant to the outlier. The 11.5 outlier increases the mean more than the median by pulling the average higher. Choice A is backed by a potential dot plot showing the cluster and isolated high point, illustrating the mean's shift. A common error is swapping mean and median effects, thinking median uses extremes. Scan for outliers or tails in the data first, then prefer median for typical in skewed sets.
A manager recorded the time (in minutes) it took for 16 food deliveries to arrive during a lunch shift: 9, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 15, 32. How does the value 32 affect the mean and the median delivery time?
It decreases both the mean and the median because it is far from the rest of the data.
It has about the same effect on the mean and the median because both use all 16 values equally.
It increases the median much more than the mean because the median uses the largest value.
It increases the mean more than the median because the mean is sensitive to extreme values.
Explanation
The concept here is interpreting the shape, center, spread, and outliers in a data distribution to understand how extreme values affect measures like mean and median. The distribution of delivery times is right-skewed, with most values clustered between 9 and 15 minutes and a long tail to the right due to the outlier at 32 minutes. A 'typical' value in this skewed distribution is best represented by the median, which is resistant to outliers. The outlier at 32 increases the mean by pulling it toward the higher value, while the median remains unchanged because it is based on the middle positions. This supports choice B, as the dot plot would show a cluster of points from 9 to 15 with one far to the right, demonstrating the mean's sensitivity to extremes. A common misconception is thinking the median is more affected because it seems central, but actually, the mean is pulled more in skewed distributions. To apply this, always scan for tails or outliers in the plot first, then select the median for center in skewed data.
A bookstore tracked the number of minutes customers spent in the store on a slow afternoon. The data are: 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 16, 45. Which conclusion about the distribution is best supported?
The distribution is right-skewed; most visits cluster around 10–14 minutes, and the 45-minute value increases the mean more than the median.
The distribution is left-skewed because most values are near 45 minutes with a tail toward smaller times.
The distribution has no skew because one extreme value does not affect the shape.
The distribution is symmetric with a typical value near 45 minutes because the largest value represents the center.
Explanation
The key concept is analyzing the shape, center, spread, and outliers in customer visit times to draw conclusions. The distribution is right-skewed, with most times between 8 and 16 minutes forming a cluster and a tail extending to 45 minutes. A typical value is around 10–14 minutes, better described by the median due to the skew. The outlier at 45 increases the mean more than the median, as the mean incorporates the extreme value in its average. Choice B is supported, as a dot plot would reveal the dense grouping in the lower range with one point far right, showing skew and the outlier's pull on the mean. A misconception is calling it left-skewed because most values are low, but the tail direction determines skew. Look for tails or isolated points in the plot first, then use median for center in skewed data.
A wildlife volunteer recorded the number of minutes it took to spot a bird at a feeder on 13 mornings: 8, 8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 27. Which conclusion about center is best supported?
The mean is a better description of a typical morning because it is not influenced by the 27-minute wait.
The typical wait time is 27 minutes because it is the most extreme and therefore most representative.
Mean and median will be affected equally by the 27-minute wait because both depend on all 13 values.
The median is a better description of a typical morning because the 27-minute wait makes the mean larger than what most mornings look like.
Explanation
We're assessing center measures in bird spotting times with an outlier, considering shape and typical values. The distribution is right-skewed, clustering from 8 to 13 with a tail to 27. Typical morning is best as median, resistant to the outlier. The 27 makes mean larger than most days. Choice B holds, as a plot would show low cluster and high outlier, favoring median. Misconception: mean better for not using extreme, but it's opposite. Identify outliers first, then choose median for skewed typical.
A teacher posted the quiz scores (out of 20) for one class: 8, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 16, 19, 20. Which measure of spread is most affected if the score 20 is replaced with a 2 because one student was absent and received a zero that was entered incorrectly as 2?
Median, because the median always uses the smallest and largest values.
Range, because changing an extreme value changes the minimum or maximum a lot.
IQR, because one extreme value dramatically changes the middle 50% of the data.
None; all spread measures stay about the same when only one value changes.
Explanation
We're examining how an outlier affects measures of spread in a distribution of quiz scores, focusing on shape, center, and variability. The original distribution is roughly symmetric or slightly right-skewed, but replacing 20 with 2 creates a left tail, changing the shape to left-skewed. Typical spread here might use IQR for robustness, but the question asks which is most affected. The change dramatically increases the range by altering the minimum from 8 to 2, while IQR remains stable as it focuses on the middle 50%. Choice A is correct, as a stem-and-leaf or dot plot would show the extreme shift in min/max, greatly widening the range. People often mix up range and IQR, thinking IQR changes with extremes, but it doesn't. Always check for outliers or tails first in the data list, then evaluate how they impact range versus resistant measures like IQR.
A histogram shows the distribution of package delivery times (in days) for 30 packages. Most packages are between 2 and 5 days, but there is a single package that took 14 days.
Histogram counts by bin:
2–3 days: 9 packages
3–4 days: 11 packages
4–5 days: 8 packages
5–6 days: 1 package
13–14 days: 1 package
Which statement is best supported by the histogram?
The distribution is left-skewed, and the 14-day value will pull the mean below the median.
Most packages took 14 days because the 13–14 day bar shows the largest delivery time.
The distribution is symmetric because the tallest bar is in the middle, so there is no outlier effect.
The distribution is right-skewed; the typical delivery time is around 3–4 days, and the 14-day value increases the mean more than the median.
Explanation
This involves interpreting a histogram for shape, center, and outlier effects in delivery times. The histogram shows right-skewed shape, with tall bars at 2–5 days and a tail to 14 days. Typical time is around 3–4 days, using median for skew. The 14-day outlier increases mean more than median. Choice B is supported, as bars cluster left with isolated right bar, showing skew and mean pull. Error: calling symmetric from middle tall bar, ignoring tail. Examine tails in histogram first, then use median for center in skew.
A small company recorded the number of support tickets received each day for 12 days: 9, 10, 10, 11, 11, 11, 12, 12, 13, 13, 14, 28. Which statement best describes the shape and typical value of the distribution?
Right-skewed with a typical value around 28 tickets per day because the outlier sets the center.
Left-skewed with a typical value around 28 tickets per day.
Right-skewed with a typical value around 11–12 tickets per day.
Symmetric with a typical value around 20 tickets per day.
Explanation
We're interpreting the shape and center of support ticket data with an outlier. The distribution is right-skewed, with values mostly from 9 to 14 and a tail to 28. Typical value is around 11–12 tickets, using the median for skew. The outlier affects mean more, but the focus is shape and typical. Choice A fits, as a dot plot would show clustering at 11–12 with one far right, not making 28 typical. Misconception: thinking high outlier sets center, but it doesn't. Identify tails first, then use median for typical in skewed data.
A box plot summarizes the number of minutes students spent on a homework assignment. The box plot shows: minimum = 8, $Q_1$ = 10, median = 12, $Q_3$ = 14, maximum = 15, with one additional point plotted at 35 minutes beyond the whisker. Which statement is best supported?
The distribution is left-skewed, and the outlier at 35 minutes will pull the mean below the median.
The distribution is symmetric, and the outlier will change the IQR dramatically because it is far from the rest.
There is no evidence of an outlier; the point at 35 minutes must be removed as an error.
The distribution is right-skewed because of the high outlier, and the outlier increases the range much more than it changes the IQR.
Explanation
The concept is using a box plot to interpret shape, spread, and outliers in homework times. The box plot shows right-skewed distribution, with a compact box from 10 to 14 and whiskers to 8 and 15, plus a long tail to the outlier at 35. Typical time uses median at 12 due to skew. The outlier greatly increases range but leaves IQR unchanged. Choice B is supported, as the plot visually extends far right, showing skew and range impact. Misconception: thinking left-skewed from low min, but tail direction matters. Check for extended whiskers or outliers first, then compare range and IQR.
A cafeteria measured the wait time (in minutes) for 18 students to get lunch. The data are: 7, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 12, 12, 13, 13, 25. Which measure is most affected by the value 25?
IQR, because a single extreme value greatly changes the middle 50% of the data.
Mean, because it uses all values and is pulled toward extreme values.
Median, because it is determined mainly by the largest value.
None; the value 25 affects shape but not numerical measures.
Explanation
This question explores how an outlier impacts measures in lunch wait times, focusing on center and spread. The distribution is right-skewed, with most times from 7 to 13 and a tail to 25. Typical wait uses median, less affected by the extreme. The 25 pulls the mean up significantly, as it averages in. Choice A is correct, as a dot plot would cluster low with one high point, shifting mean most. Error: confusing with median using extremes, but it doesn't. Look for tails first, then see mean's sensitivity in skewed data.