Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

Opening subject page...

Loading your content

Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

Statistics

Statistics Practice Test: Practice Test 31

Practice Test 31 for Statistics: real questions and explanations from the Varsity Tutors practice-test pool.

0 / 25 answered

Question 1 of 25

A grocery store recorded the number of items in 10 customers’ baskets: 6, 7, 7, 8, 8, 9, 9, 10, 10, 27. Which statement best describes how the outlier affects the mean and median number of items?

Question Navigator

All questions

Question 1

A grocery store recorded the number of items in 10 customers’ baskets: 6, 7, 7, 8, 8, 9, 9, 10, 10, 27. Which statement best describes how the outlier affects the mean and median number of items?

The median increases more than the mean because the median is based on the farthest value.
The mean increases more than the median because the outlier is much larger than the rest of the data. (correct answer)
Neither the mean nor the median changes because one value cannot affect measures of center.
The mean decreases while the median increases because the data are right-skewed.

Explanation: This question examines how an outlier affects mean versus median in shopping basket data. The item counts are: 6, 7, 7, 8, 8, 9, 9, 10, 10, 27, where most customers have 6-10 items but one has 27. The distribution is right-skewed because the outlier at 27 creates a long right tail, making the median (8.5 items) more appropriate for typical basket size than the mean. The outlier affects these measures differently: without it, the mean would be about 8.1 items, but with the outlier, it jumps to 10.1 items—an increase of 2 items. The median changes from 8 to 8.5 items, only a 0.5-item shift, demonstrating its resistance to outliers. This illustrates the principle that the mean increases more than the median when there's a high outlier because the mean uses all values in its calculation while the median depends only on position. A common misconception is that the median changes more or that both change equally, but the mean is always more sensitive to extreme values.

Question 2

A teacher claims a multiple-choice quiz with 4 options per question is being guessed randomly, so $P(\text{correct})=0.25$ on each question. A student answers 40 questions and gets 19 correct. A simulation of 150 random-guessing students (40 questions each) shows that 0 out of 150 got 19 or more correct. Would this result cause you to question the model? Why? (Treat “as extreme or more extreme” as $\ge 19$ correct out of 40.)

Yes; it proves the student knew exactly $19/40$ of the material, not that guessing is unlikely.
Yes; $\ge 19$ correct happened 0 out of 150 times in simulation, so it’s extremely unlikely under random guessing and raises doubt. (correct answer)
No; because 19 is less than half of 40, it must be a common outcome under random guessing.
No; it proves the student was guessing because the simulation did not produce exactly 19 correct.

Explanation: This question evaluates whether a student getting 19 correct out of 40 questions is consistent with random guessing where P(correct) = 0.25. Under pure guessing, we'd expect about 10 correct (25% of 40), so 19 correct is much higher—but could it happen by chance? The simulation of 150 random-guessing students shows that 0 got 19 or more correct, meaning this outcome never occurred in the simulations (0/150 = 0%). When an observed result would essentially never happen under the assumed model, it provides very strong evidence against that model—here, it strongly suggests the student wasn't just guessing. A misconception is thinking that because 19 is less than half of 40, it must be consistent with guessing, but the key is comparing to the 25% baseline, not 50%. The evaluation method is clear: simulate under the guessing model, and when the observed performance never occurs in simulations, conclude the model (random guessing) is very likely wrong.

Question 3

In a school survey, two events are recorded for a randomly selected student: Event $A$ = “the student participates in a club,” and event $B$ = “the student uses the bus to get to school.” Given that $P(A)=0.30$ , $P(B)=0.40$ , and $P(A\cap B)=0.15$ , are $A$ and $B$ independent? Justify using probabilities by comparing $P(A\cap B)$ to $P(A)P(B)$ .

Yes; $P(A)P(B)=0.30\cdot 0.40=0.12$ and $P(A\cap B)=0.15$ , so they match.
Yes; $P(A\cap B)=P(A)+P(B)=0.70$ .
No; $P(A)P(B)=0.30\cdot 0.40=0.12$ and $P(A\cap B)=0.15$ , so they do not match. (correct answer)
Yes; clubs and transportation seem unrelated, so the events are independent.

Explanation: To determine independence, we must check if P(A∩B) = P(A)P(B). Computing P(A)P(B) = 0.30 × 0.40 = 0.12. The problem gives us P(A∩B) = 0.15. Since 0.15 ≠ 0.12, events A and B are not independent. A tempting mistake (choice D) is assuming that because clubs and transportation seem unrelated in real life, the events must be independent—but independence is determined by the probability calculation, not intuition. Another error (choice C) would be adding probabilities instead of checking the product. The key lesson: always compute the product and compare, regardless of how related or unrelated the events seem contextually.

Question 4

A teacher recorded whether students turned in an assignment on time and whether they used a planner. The results are shown in the two-way table.

Which statement best describes the association between using a planner and turning in the assignment on time?

	On Time	Late	Total
Planner: Yes	72	18	90
Planner: No	36	54	90
Total	108	72	180

There is no association because the totals for planner users and non-users are the same.
There is an association: a higher proportion of non-users turn in the assignment on time than planner users.
There is an association: a higher proportion of planner users turn in the assignment on time than non-users. (correct answer)
There is little to no association because more students turned in the assignment on time than late overall.

Explanation: Association is evaluated by comparing conditional relative frequencies across groups to check if turning in on time differs by planner use. Calculate conditional frequencies using subgroup totals: for planner yes, on time is 72/90 = 0.80; for no, it's 36/90 = 0.40. Locate cells for on time in each group (72 and 36) and divide by group totals (90 each). The computation shows planner users have a higher proportion on time (80%) than non-users (40%), implying an association between planner use and timeliness. For association, compare these conditional frequencies across groups; the difference suggests the variables are connected. A common misconception is using overall counts, like more on time total, instead of within-group proportions. To transfer, compute ratios within each row and compare to identify differences in behavior by group.

Question 5

A spinner has 8 equally likely outcomes labeled 1 through 8. The outcomes are grouped below:

Odd outcomes: 1, 3, 5, 7
Even outcomes: 2, 4, 6, 8
Prime outcomes: 2, 3, 5, 7

Let event $A$ be “the outcome is prime,” and let event $B$ be “the outcome is odd.” Based on the outcomes listed, what fraction of outcomes in $B$ also belong to $A$ ? Give your answer as a fraction.

$\frac{4}{4}$
$\frac{3}{4}$ (correct answer)
$\frac{3}{8}$
$\frac{4}{8}$

Explanation: This question tests conditional probability using a list of spinner outcomes. 'Given B' means we restrict our attention only to the odd outcomes. There are 4 odd outcomes (1, 3, 5, 7). Among these 4, 3 are prime (3, 5, 7). Therefore, the fraction 3/4 represents the conditional probability of a prime outcome given that it is odd. A common mistake is using the total 8 outcomes as the denominator, like 3/8, but that overlooks the restriction to odd outcomes. A good strategy is to first circle or list the outcomes in B, then count how many of those are also in A.

Question 6

Adult heights in a certain population are roughly symmetric and bell-shaped, so using a normal model is reasonable. Heights are modeled as $N(\mu=170\text{ cm},\,\sigma=6\text{ cm})$ . Approximately what percent of adults are between $164$ cm and $176$ cm? (Use the empirical rule.)

68% (correct answer)
32%
95%
16%

Explanation: This problem asks for the percentage of adults with heights between 164 cm and 176 cm. The lower bound of 164 cm is exactly 1 SD below the mean since (164 - 170)/6 = -1, and the upper bound of 176 cm is exactly 1 SD above the mean since (176 - 170)/6 = 1. According to the empirical rule, approximately 68% of values in a normal distribution fall within 1 standard deviation of the mean. Therefore, about 68% of adults have heights between 164 cm and 176 cm. A common error is confusing this with 95%, which represents the percentage within 2 SDs of the mean. When solving, always calculate how many SDs each boundary is from the mean before applying the empirical rule.

Question 7

From a standard 52-card deck, event $A$ is “the card is red,” and event $B$ is “the card is a king.” You know $P(A)=\frac{26}{52}$ , $P(B)=\frac{4}{52}$ , and $P(A\cap B)=\frac{2}{52}$ . What is the probability that $A$ or $B$ occurs, where “or” is inclusive (red or king or both)?

$\frac{26}{52}$
$\frac{2}{52}$
$\frac{30}{52}$
$\frac{28}{52}$ (correct answer)

Explanation: This card problem applies the Addition Rule to find the probability of drawing a red card or king or both. Adding P(A) = 26/52 for red cards and P(B) = 4/52 for kings gives 30/52, but this double-counts the two red kings (King of hearts and King of diamonds). We subtract P(A∩B) = 2/52 to remove this overlap. The calculation is P(A∪B) = 26/52 + 4/52 - 2/52 = 28/52. This correctly gives the probability of drawing at least one of these characteristics. Visualizing the deck helps—half the cards are red, four are kings, but two kings are already counted in the red cards, requiring subtraction.

Question 8

A club has four project ideas—Garden, Mural, Recycling, and Tutoring—and must pick one to start first. They want a fair random choice, meaning each idea has probability $1/4$ of being selected. Which method gives each idea an equal chance?

Roll two fair dice and add: 2–4=Garden, 5–7=Mural, 8–10=Recycling, 11–12=Tutoring.
Flip a coin: heads=Garden, tails=Recycling; if someone complains, switch to Mural.
Pick the idea that appears first in the club’s written list.
Use a random number generator to pick 1, 2, 3, or 4 with equal probability; 1=Garden, 2=Mural, 3=Recycling, 4=Tutoring. (correct answer)

Explanation: This question tests using probability to make fair decisions among four project ideas, where fairness means each idea has exactly 1/4 probability. To determine fairness, we examine the probability distribution for each method. Option C uses a random number generator to pick 1, 2, 3, or 4 with equal probability, assigning one number to each project, so each has exactly 1/4 chance - this is fair. Option A uses list order, which isn't random and creates positional bias. Option B uses dice sums with unequal ranges (Garden: 3 outcomes, Mural: 3 outcomes, Recycling: 3 outcomes, Tutoring: 2 outcomes out of 11 possible sums), creating sum bias. Option D only considers two projects initially and adds subjective switching, creating selection bias. Fairness requires equal chances for all four ideas through a truly random process.

Question 9

A teacher randomly assigned 60 students to try two study plans for a vocabulary quiz: Treatment A (spaced practice) and Treatment B (single long review). Thirty students were randomly assigned to each treatment. The outcome was the quiz score (out of 100). Treatment A had a mean score of 84.2 and Treatment B had a mean score of 78.5, for an observed difference in means of $84.2-78.5=5.7$ points (A − B). To test whether this difference could be due to chance under “no treatment effect,” the teacher performed a randomization test by shuffling the treatment labels 2000 times and recalculating the difference in means each time. In the simulations, 18 of the 2000 shuffled differences were at least as large as $5.7$ (A − B).

Which conclusion is most reasonable about the treatment effect?

Since only 18 out of 2000 shuffled differences were at least as large as 5.7, the observed difference would be rare under no effect, so there is evidence that Treatment A tends to produce higher mean scores than Treatment B. (correct answer)
Because the teacher did not randomly sample students from all students everywhere, no conclusion can be made about whether Treatment A caused higher scores in this class.
Because 18 simulated differences were at least as large as 5.7, the observed difference is common under no effect, so there is not enough evidence that Treatment A is better.
Because the students were randomly assigned, Treatment A is guaranteed to raise scores by exactly 5.7 points for every student.

Explanation: This question tests understanding of randomized experiments and simulation-based inference. Random assignment allows us to attribute differences to the treatment rather than confounding factors. The observed difference of 5.7 points (Treatment A minus Treatment B) represents how much higher the mean score was for the spaced practice group. The randomization test simulates what differences we'd see if there were no treatment effect by shuffling labels 2000 times. Finding only 18 out of 2000 shuffled differences at least as large as 5.7 means the observed difference would be rare (less than 1% chance) under no effect. This rarity provides evidence that Treatment A tends to produce higher scores, though it doesn't guarantee every student benefits by exactly 5.7 points. Random assignment differs from random sampling—we can make causal conclusions about this class even without sampling from all students everywhere.

Question 10

A bag contains 5 red marbles and 3 blue marbles. You draw one marble first and do not replace it. Let event $A$ be “the first marble is red,” so $P(A)=\frac{5}{8}$ . Let event $B$ be “the second marble is blue.” Given that the first marble was red, $P(B\mid A)=\frac{3}{7}$ . What is the probability that both events occur, $P(A\cap B)$ ?

$\frac{15}{56}$ (correct answer)
$\frac{5}{8}+\frac{3}{7}$
$\frac{3}{7}$
$\frac{5}{8}\cdot\frac{3}{8}$

Explanation: This question tests the Multiplication Rule for probability. The "and" indicates we need both events to occur in sequence: first drawing a red marble, then drawing a blue marble. The first event is drawing a red marble, with probability P(A) = 5/8. The second event is conditional: given that we drew a red marble first (and didn't replace it), the probability of drawing blue is P(B|A) = 3/7. To find P(A∩B), we multiply: (5/8) × (3/7) = 15/56. A common mistake is to use 3/8 for the second probability, forgetting that removing a red marble changes the total. The strategy is to read this as "first draw red, then draw blue from the remaining marbles."

Question 11

A school club sells a lottery-style ticket for $2</u>. Exactly one of the following outcomes happens when you buy 1 ticket: with probability$ 0.10 $you win$ 10; with probability $0.20$ you win $3; with probability$ 0.70 $you win$ 0. From the player's perspective, what is the expected value (long-run average net gain/loss) of buying 1 ticket?

$0.00
$-0.40 (correct answer)
$-1.60
$1.60

Explanation: The expected value in a game like this lottery represents the long-run average net gain or loss per ticket if you were to buy many tickets over time. To compute it, calculate the net gain for each outcome by subtracting the $2 cost from the prize, then multiply by the respective probabilities and sum them up. Here, the net gains are$ 8 (probability 0.10), $1 (probability 0.20), and -$ 2 (probability 0.70), yielding an expected value of 0.10×8 + 0.20×1 + 0.70×(-2) = -0.40. This negative value indicates that, on average, you lose 40 cents per ticket in the long run, making the game unfavorable for the player. A common misconception is forgetting to subtract the ticket cost from each prize, which would incorrectly give an expected prize of $1.60 instead of the net expected value. Recognizing this distinction helps players understand the true financial implications of participating in such games.

Question 12

A card is drawn at random from a set of 3 cards labeled A, B, and C. The sample space is $\{A,B,C\}$ with each outcome equally likely. Define the random variable $X$ as follows: $X(A)=0$ , $X(B)=0$ , and $X(C)=1$ . Which table correctly represents the probability distribution of $X$ ?

| $X$ | $P(X)$ |\n|---|---|\n| $0$ | $\tfrac{1}{3}$ |\n| $1$ | $\tfrac{2}{3}$ |
| $X$ | $P(X)$ |\n|---|---|\n| $0$ | $\tfrac{2}{3}$ |\n| $1$ | $\tfrac{1}{3}$ | (correct answer)
| $X$ | $P(X)$ |\n|---|---|\n| $A$ | $\tfrac{1}{3}$ |\n| $B$ | $\tfrac{1}{3}$ |\n| $C$ | $\tfrac{1}{3}$ |
| $X$ | $P(X)$ |\n|---|---|\n| $0$ | $\tfrac{2}{3}$ |

Explanation: This problem involves a random variable where multiple outcomes map to the same value. A random variable X assigns numbers to outcomes in {A, B, C}, with X(A) = 0, X(B) = 0, and X(C) = 1. Since A and B both map to X = 0, we combine their probabilities: P(X = 0) = P(A) + P(B) = 1/3 + 1/3 = 2/3. Only C maps to X = 1, so P(X = 1) = 1/3. The correct distribution lists X-values 0 and 1 with probabilities 2/3 and 1/3, summing to 1. A common mistake is thinking each X-value must have equal probability, but this depends on the mapping. The approach is: identify all outcomes mapping to each X-value → sum their individual probabilities.

Question 13

A city library wants to estimate the proportion of all adult library cardholders in the city who prefer e-books over printed books. The library has a complete list of adult cardholders and uses a random number generator to select 120 cardholders to survey. In the sample, 48 say they prefer e-books. What population parameter is being estimated using this random sample?

The proportion of the 120 surveyed cardholders who prefer e-books
The mean number of e-books read per month by all adult city residents
The proportion of all adult library cardholders in the city who prefer e-books (correct answer)
The proportion of all library visitors nationwide who prefer e-books

Explanation: Statistics uses samples to make inferences about larger populations, allowing us to estimate characteristics without surveying everyone. In this scenario, the population is all adult library cardholders in the city, and the sample is the 120 randomly selected cardholders surveyed about their preference for e-books. The parameter of interest is the proportion of the entire population who prefer e-books, while the sample statistic is the proportion (48/120) from those surveyed. Random sampling is crucial because it reduces bias and helps ensure the sample represents the population, supporting reliable generalizations. Based on this, it's reasonable to infer that around 40% of all city cardholders prefer e-books, but we cannot conclude the exact proportion with certainty due to sampling variability. A common misconception is that the sample proportion exactly equals the population proportion, but it's only an estimate. To apply this, ask: 'Who do we want to know about?' (all adult cardholders) and 'Who did we measure?' (the 120 surveyed).

Question 14

A school newspaper reports a link between students’ participation in after-school clubs and their GPA. A reporter collects data by looking up GPA in school records and recording each student’s number of clubs from the activities database. No one is told to join or not join clubs. Which type of study is described?

Experiment, because the reporter compares two groups (club vs no club)
Experiment, because GPA is the response variable and that means a treatment was applied
Sample survey, because using a database is the same as asking a random sample of students questions
Observational study, because variables are measured from records and no treatment is assigned (correct answer)

Explanation: In statistics, we distinguish between types of studies like surveys, observational studies, and experiments, and understand how randomization helps in making inferences about populations or causation. The key feature of an experiment is that researchers actively assign subjects to different treatments to observe their effects. Random sampling involves selecting subjects randomly from a population, which supports generalizing results from the sample to the broader population. Random assignment means randomly allocating subjects to treatment groups, which supports causal conclusions by balancing out confounding factors. In this scenario, the reporter observes GPA and club participation from records without assigning any treatment like joining clubs, making it an observational study that can show associations but not causation. A common misconception is that comparing groups means it's an experiment, but without random assignment, factors like motivation could confound the GPA-club link. To classify future studies, ask: 'Did researchers assign a treatment?' (no, so observational) and 'Was the sample randomly selected?' (not specified, but focus is on lack of treatment).

Question 15

A local bakery tracks the outside temperature (°F) and the number of iced coffees sold that day. A scatter plot shows a roughly linear upward trend, and technology reports $r=0.91$ . Which statement best interprets the value of $r$ for the relationship between temperature and iced coffee sales?

There is a strong positive linear association: warmer days tend to be associated with more iced coffees sold. (correct answer)
There is a strong negative linear association: warmer days tend to be associated with fewer iced coffees sold.
Because $r=0.91$ , each 1°F increase will increase iced coffee sales by exactly 0.91 coffees.
Warmer temperatures cause people to buy iced coffee, as shown by $r=0.91$ .

Explanation: r, the correlation coefficient, assesses how closely two variables follow a linear pattern, with positive values indicating that as one increases, so does the other, and strength gauged by proximity to ±1. With r = 0.91, there is a strong positive linear association, suggesting warmer temperatures are associated with higher iced coffee sales. The value |r| = 0.91 denotes strong correlation, as it approaches 1. However, people often misinterpret r as proof of causation, like assuming temperature causes sales increases. Another common mistake is confusing r with the exact rate of change, such as expecting 0.91 more coffees per degree. r does not measure causation or predict individual outcomes precisely; it summarizes overall linear trends. The best interpretation highlights the association and avoids causal claims.

Question 16

A snack company wants to estimate the population parameter $\mu$ , the mean rating (out of 10) that all customers would give a new flavor. A random sample of $n=60$ customers produced a sample statistic $\bar{x}=7.4$ . Students simulated 5,000 repeated random samples of size 60 and recorded the sample mean each time. In the simulation, the middle 95% of sample means ran from 7.0 to 7.8.

What margin of error is supported by the simulation for estimating $\mu$ ?

7.4 rating points
0.8 rating points
0.04 rating points
0.4 rating points (correct answer)

Explanation: We're estimating the margin of error for the population mean $μ$ of snack ratings via simulation. Repeated samples of size 60 generate a distribution of sample means around the observed 7.4. The MOE is half the width of the interval containing about 95% of these means, reflecting expected variability. The simulation's middle 95% is from 7.0 to 7.8, a width of 0.8, so MOE is 0.4 rating points. This suggests $μ$ is plausibly $7.4 \pm 0.4$ . People sometimes mistake the full width for the MOE; remember to halve it. For transfer, always find the 95% simulation range and divide its width by two for MOE.

Question 17

A game designer randomly assigned 90 players to try either Treatment A (a new tutorial) or Treatment B (the old tutorial), 45 players per group. The outcome was whether the player completed the first level without hints.

Results: Treatment A had 33/45 successes ( $73.3\%$ ), Treatment B had 21/45 successes ( $46.7\%$ ). Observed difference in proportions (A − B) = $0.733-0.467=0.266$ .

A randomization test shuffled the success/failure outcomes across groups 5,000 times under “no treatment effect.” In the simulation, 18 of the 5,000 shuffled differences were at least as large as $0.266$ .

Based on the randomization test, is the observed difference surprising under no effect?

No. Since 18 of 5,000 is not zero, the observed difference is expected under no effect and does not suggest a treatment effect.
Yes. Only 18 of 5,000 shuffled differences were $0.266$ , so the observed difference would be rare under no effect, suggesting the new tutorial increases the completion rate. (correct answer)
Yes. Random assignment means confounders must explain the difference, so the simulation is unnecessary.
No. The randomization test should count shuffled differences with absolute value $0.266$ even though the question is about A being higher, so the given count cannot be used.

Explanation: To compare treatments, randomized experiments use simulation to test if differences are real or chance-based. Random assignment balances groups, allowing cause-and-effect conclusions by minimizing biases. The observed difference here is 0.266 higher proportion of successes for Treatment A versus B. Under 'no effect,' the simulation shuffles outcomes many times, creating a distribution of chance differences. Only 18 out of 5,000 simulations had differences as large as or larger than 0.266, marking the observed as rare without an effect. A misconception is equating random assignment with random sampling; assignment enables causality, and rarity evidences but doesn't prove the effect. Apply this by counting simulated differences at least as extreme as observed in other scenarios.

Question 18

A fitness app uses the model $y = 180 - 6x,$ where $x$ is the number of minutes after a workout ends (minutes) and $y$ is a person’s heart rate (beats per minute). Which interpretation of the slope is correct? (Include units.)

For each additional 1 minute after the workout ends, the model predicts heart rate decreases by 6 beats per minute. (correct answer)
When $x=0$ minutes, the model predicts the heart rate is 6 beats per minute.
For each additional 1 beat per minute of heart rate, the model predicts time increases by 6 minutes.
For each additional 1 minute after the workout ends, the model predicts heart rate increases by 6 beats per minute.

Explanation: The concept of slope in linear models captures the rate of change between variables, crucial for predictions in contexts like health metrics. Slope is defined as the change in y per 1-unit increase in x, with units such as beats per minute per minute after workout. Applied here, the slope of -6 indicates heart rate decreases by 6 beats per minute for each additional minute post-workout. The intercept is the y-value when x=0, representing the initial heart rate right after the workout ends. In this model, the intercept of 180 means a predicted heart rate of 180 bpm at x=0, which is meaningful for an elevated post-exercise rate. People often misconceive negative slopes as increases or flip the units, like minutes per beat instead of beats per minute. A useful strategy is to label axes descriptively with units, such as 'heart rate (bpm)' on y and 'time after workout (minutes)' on x, ensuring units clarify the rate.

Question 19

A quality-control measurement is approximately normal. The measurement has mean $\mu=200$ units and standard deviation $\sigma=5$ units. Approximately what percent of measurements are below 190 units? (Use the empirical rule.)

About 5%
About 97.5%
About 16%
About 2.5% (correct answer)

Explanation: We're using a normal model to estimate the percentage of measurements below 190 units. Cutoff 190 is 10 below mean 200, with SD 5, so 10/5 = -2 SD. Empirical rule: 95% within 2 SD, leaving 2.5% below -2 SD. Thus, about 2.5% are below 190. This fits the lower tail direction for below. Common error: using 5% for both tails combined, but it's 2.5% per tail. Sketch mean at 200, mark -1 SD 195 and -2 at 190, focus left tail, estimate 2.5%.

Question 20

A gym wants to estimate the mean number of visits per month for all current gym members. The gym randomly selects 80 members from its membership database and finds the sample mean is 6.2 visits per month. What is a reasonable inference based on the sample?

Going to this gym causes members to visit 6.2 times per month
All gym members visit exactly 6.2 times per month
Exactly 6.2% of gym members visit the gym each month
The mean number of visits per month for all gym members is likely close to 6.2 (correct answer)

Explanation: In statistics, we use samples to infer about populations, estimating things like average behaviors without tracking everyone. The population is all current gym members, and the sample is the 80 randomly selected members whose visits were averaged at 6.2 per month. The parameter is the mean visits for the population, with the statistic being the sample mean of 6.2. Random sampling helps by reducing bias, allowing the sample to represent the population for better generalizations. A reasonable inference is that the population mean is likely around 6.2, but we can't say it's exact or that the gym causes this behavior. Misconception: assuming the sample mean applies exactly to every individual, but it describes the group average. Ask: 'Who do we want to know about?' (all members) and 'Who did we measure?' (the 80 selected).

Question 21

A café tracked the number of customers per hour for 10 hours on a weekday:

18, 19, 20, 20, 21, 21, 22, 22, 23, 45

Which conclusion about the data is best supported?

The outlier mainly changes the IQR because it is far from the rest of the data.
Most hours have about 20–23 customers, and the hour with 45 customers is an unusual high value that pulls the mean up more than the median. (correct answer)
A typical hour has around 45 customers because that value is the largest and stands out.
The data are left-skewed because one hour had far fewer customers than the rest.

Explanation: This question involves interpreting customer data with an outlier. The values 18, 19, 20, 20, 21, 21, 22, 22, 23, 45 show most hours had 18-23 customers, clustering around 20-23, while one hour had 45 customers—clearly an unusual high value. This creates a right-skewed distribution with the outlier pulling the tail to the right. For typical customer counts, the median (21) better represents the center since it's resistant to the outlier, while the mean gets pulled up by the 45 value. The outlier affects the mean more than the median because the mean uses all values in its calculation, including the extreme 45. Option C correctly identifies that most hours have about 20-23 customers and explains how the outlier pulls the mean up more than the median. Students might incorrectly think the outlier mainly affects the IQR, but the IQR focuses on the middle 50% and actually ignores outliers by design.

Question 22

A class recorded how many minutes 12 students spent reading last night: 10, 12, 15, 15, 18, 20, 20, 22, 25, 25, 28, 30. Which statement correctly describes the distribution shown by a box plot of these data?

The median is 22 minutes, and the range is 20 minutes.
The median is 18 minutes, and the range is 20 minutes.
The median is 20 minutes, and the range is 20 minutes. (correct answer)
The median is 20 minutes, and the range is 15 minutes.

Explanation: This question involves a box plot of reading times for 12 students. A box plot shows five key values: minimum, Q1, median, Q3, and maximum. With 12 ordered values, the median is between the 6th and 7th values: (20+20)/2 = 20 minutes. The range is maximum minus minimum: 30-10 = 20 minutes. Box plots don't show individual data points but summarize the distribution's spread and center. A common error is confusing the interquartile range (Q3-Q1) with the full range. To create a box plot: first order the data, find the five-number summary, then draw the box from Q1 to Q3 with a line at the median.

Question 23

A fitness app recorded the time (in minutes) it took two groups to complete the same 2-mile walk.

Data Set A (Group A times): 22, 23, 24, 24, 25, 25, 26, 26, 27, 28, 60 Data Set B (Group B times): 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 30

Data Set A has a clear high outlier (60 minutes), while Data Set B does not. Which statement best compares the typical value and variability using appropriate measures (median and IQR)?

Using median and IQR, Group A has a lower typical time and similar variability to Group B. (correct answer)
Using median and IQR, Group A has a higher typical time and a larger variability than Group B.
Using median and IQR, Group B has a lower typical time and a larger variability than Group A.
Using mean and standard deviation, Group A has a higher typical time and much greater variability than Group B.

Explanation: When comparing distributions, we must first identify their shape to select appropriate measures. Data Set A has a clear high outlier (60 minutes), making it skewed, so we should use median and IQR for both sets to ensure fair comparison. For center: Group A's median is 25 minutes (middle of ordered data excluding outlier influence), while Group B's median is 27 minutes, so Group A has a lower typical time. For spread: Group A's IQR (Q3-Q1) is approximately 3 minutes, while Group B's IQR is also approximately 3 minutes, showing similar variability. The outlier in Group A doesn't affect these resistant measures. A common error is using mean with outliers, which would incorrectly inflate Group A's typical value. The strategy is: identify outliers/skewness → use median/IQR → compare resistant measures in context.

Question 24

A company wants to estimate the proportion of all light bulbs produced this week that are defective. From the week's production, a quality-control technician uses a random-number generator to select 200 bulbs to test; 6 are defective. Why is random sampling important in this situation?

It proves that defects are caused by a specific machine in the factory
It guarantees the sample proportion of defective bulbs equals the true population proportion
It reduces bias so the tested bulbs are more likely to represent all bulbs produced this week (correct answer)
It assigns each bulb to be defective or not defective at random

Explanation: Statistics relies on samples to infer about populations, such as estimating defect rates in production without testing every item. The population here is all light bulbs produced this week, and the sample is the 200 randomly selected bulbs tested, with 6 defective. The parameter is the proportion of defective bulbs in the population, and the statistic is the 6/200 defective in the sample. Random sampling is crucial because it reduces bias, ensuring the sample reflects the population and supports valid inferences. We can infer the population proportion is likely near 3%, but not exactly, as samples vary. A common misconception is that random sampling guarantees identical results to the population, but it only makes representation more likely. Strategy: ask 'What do we want to know about?' (all bulbs this week) and 'What did we test?' (the 200 selected).

Question 25

At a university, event $A$ is “a student is majoring in Biology” and event $B$ is “a student is taking Calculus this term.” The probabilities are $P(A)=0.28$ , $P(B)=0.50$ , and $P(A\cap B)=0.12$ . What is the probability that $A$ or $B$ occurs (inclusive or)?

0.78
0.66 (correct answer)
0.54
0.16

Explanation: This question tests the Addition Rule for probability in an academic context. To find P(A or B) where A is majoring in Biology and B is taking Calculus, we start with P(A) = 0.28 and P(B) = 0.50, which sum to 0.78. However, this double-counts Biology majors who are also taking Calculus. We subtract P(A∩B) = 0.12 to correct for this overlap. The calculation is P(A∪B) = 0.28 + 0.50 - 0.12 = 0.66. This gives us the correct probability that a student is either a Biology major, taking Calculus, or both. The common mistake is forgetting to subtract the intersection; mentally sketching a Venn diagram helps visualize why the overlapping region must be removed from the sum.

Opening subject page...

Loading your content

Statistics

Statistics Practice Test: Practice Test 31

Practice Test 31 for Statistics: real questions and explanations from the Varsity Tutors practice-test pool.

0 / 25 answered

Question 1 of 25

A grocery store recorded the number of items in 10 customers’ baskets: 6, 7, 7, 8, 8, 9, 9, 10, 10, 27. Which statement best describes how the outlier affects the mean and median number of items?

Question Navigator

All questions

Question 1

A grocery store recorded the number of items in 10 customers’ baskets: 6, 7, 7, 8, 8, 9, 9, 10, 10, 27. Which statement best describes how the outlier affects the mean and median number of items?

The median increases more than the mean because the median is based on the farthest value.
The mean increases more than the median because the outlier is much larger than the rest of the data. (correct answer)
Neither the mean nor the median changes because one value cannot affect measures of center.
The mean decreases while the median increases because the data are right-skewed.

Question 2

Yes; it proves the student knew exactly $19/40$ of the material, not that guessing is unlikely.
Yes; $\ge 19$ correct happened 0 out of 150 times in simulation, so it’s extremely unlikely under random guessing and raises doubt. (correct answer)
No; because 19 is less than half of 40, it must be a common outcome under random guessing.
No; it proves the student was guessing because the simulation did not produce exactly 19 correct.

Question 3

Yes; $P(A)P(B)=0.30\cdot 0.40=0.12$ and $P(A\cap B)=0.15$ , so they match.
Yes; $P(A\cap B)=P(A)+P(B)=0.70$ .
No; $P(A)P(B)=0.30\cdot 0.40=0.12$ and $P(A\cap B)=0.15$ , so they do not match. (correct answer)
Yes; clubs and transportation seem unrelated, so the events are independent.

Question 4

A teacher recorded whether students turned in an assignment on time and whether they used a planner. The results are shown in the two-way table.

Which statement best describes the association between using a planner and turning in the assignment on time?

	On Time	Late	Total
Planner: Yes	72	18	90
Planner: No	36	54	90
Total	108	72	180

There is no association because the totals for planner users and non-users are the same.
There is an association: a higher proportion of non-users turn in the assignment on time than planner users.
There is an association: a higher proportion of planner users turn in the assignment on time than non-users. (correct answer)
There is little to no association because more students turned in the assignment on time than late overall.

Question 5

A spinner has 8 equally likely outcomes labeled 1 through 8. The outcomes are grouped below:

Odd outcomes: 1, 3, 5, 7
Even outcomes: 2, 4, 6, 8
Prime outcomes: 2, 3, 5, 7

$\frac{4}{4}$
$\frac{3}{4}$ (correct answer)
$\frac{3}{8}$
$\frac{4}{8}$

Question 6

68% (correct answer)
32%
95%
16%

Question 7

$\frac{26}{52}$
$\frac{2}{52}$
$\frac{30}{52}$
$\frac{28}{52}$ (correct answer)

Question 8

Roll two fair dice and add: 2–4=Garden, 5–7=Mural, 8–10=Recycling, 11–12=Tutoring.
Flip a coin: heads=Garden, tails=Recycling; if someone complains, switch to Mural.
Pick the idea that appears first in the club’s written list.
Use a random number generator to pick 1, 2, 3, or 4 with equal probability; 1=Garden, 2=Mural, 3=Recycling, 4=Tutoring. (correct answer)

Question 9

Which conclusion is most reasonable about the treatment effect?

Since only 18 out of 2000 shuffled differences were at least as large as 5.7, the observed difference would be rare under no effect, so there is evidence that Treatment A tends to produce higher mean scores than Treatment B. (correct answer)
Because the teacher did not randomly sample students from all students everywhere, no conclusion can be made about whether Treatment A caused higher scores in this class.
Because 18 simulated differences were at least as large as 5.7, the observed difference is common under no effect, so there is not enough evidence that Treatment A is better.
Because the students were randomly assigned, Treatment A is guaranteed to raise scores by exactly 5.7 points for every student.

Question 10

$\frac{15}{56}$ (correct answer)
$\frac{5}{8}+\frac{3}{7}$
$\frac{3}{7}$
$\frac{5}{8}\cdot\frac{3}{8}$

Question 11

$0.00
$-0.40 (correct answer)
$-1.60
$1.60

Question 12

| $X$ | $P(X)$ |\n|---|---|\n| $0$ | $\tfrac{1}{3}$ |\n| $1$ | $\tfrac{2}{3}$ |
| $X$ | $P(X)$ |\n|---|---|\n| $0$ | $\tfrac{2}{3}$ |\n| $1$ | $\tfrac{1}{3}$ | (correct answer)
| $X$ | $P(X)$ |\n|---|---|\n| $A$ | $\tfrac{1}{3}$ |\n| $B$ | $\tfrac{1}{3}$ |\n| $C$ | $\tfrac{1}{3}$ |
| $X$ | $P(X)$ |\n|---|---|\n| $0$ | $\tfrac{2}{3}$ |

Question 13

The proportion of the 120 surveyed cardholders who prefer e-books
The mean number of e-books read per month by all adult city residents
The proportion of all adult library cardholders in the city who prefer e-books (correct answer)
The proportion of all library visitors nationwide who prefer e-books

Question 14

Experiment, because the reporter compares two groups (club vs no club)
Experiment, because GPA is the response variable and that means a treatment was applied
Sample survey, because using a database is the same as asking a random sample of students questions
Observational study, because variables are measured from records and no treatment is assigned (correct answer)

Question 15

There is a strong positive linear association: warmer days tend to be associated with more iced coffees sold. (correct answer)
There is a strong negative linear association: warmer days tend to be associated with fewer iced coffees sold.
Because $r=0.91$ , each 1°F increase will increase iced coffee sales by exactly 0.91 coffees.
Warmer temperatures cause people to buy iced coffee, as shown by $r=0.91$ .

Question 16

What margin of error is supported by the simulation for estimating $\mu$ ?

7.4 rating points
0.8 rating points
0.04 rating points
0.4 rating points (correct answer)

Question 17

Results: Treatment A had 33/45 successes ( $73.3\%$ ), Treatment B had 21/45 successes ( $46.7\%$ ). Observed difference in proportions (A − B) = $0.733-0.467=0.266$ .

Based on the randomization test, is the observed difference surprising under no effect?

No. Since 18 of 5,000 is not zero, the observed difference is expected under no effect and does not suggest a treatment effect.
Yes. Only 18 of 5,000 shuffled differences were $0.266$ , so the observed difference would be rare under no effect, suggesting the new tutorial increases the completion rate. (correct answer)
Yes. Random assignment means confounders must explain the difference, so the simulation is unnecessary.
No. The randomization test should count shuffled differences with absolute value $0.266$ even though the question is about A being higher, so the given count cannot be used.

Question 18

For each additional 1 minute after the workout ends, the model predicts heart rate decreases by 6 beats per minute. (correct answer)
When $x=0$ minutes, the model predicts the heart rate is 6 beats per minute.
For each additional 1 beat per minute of heart rate, the model predicts time increases by 6 minutes.
For each additional 1 minute after the workout ends, the model predicts heart rate increases by 6 beats per minute.

Question 19

About 5%
About 97.5%
About 16%
About 2.5% (correct answer)

Question 20

Going to this gym causes members to visit 6.2 times per month
All gym members visit exactly 6.2 times per month
Exactly 6.2% of gym members visit the gym each month
The mean number of visits per month for all gym members is likely close to 6.2 (correct answer)

Question 21

A café tracked the number of customers per hour for 10 hours on a weekday:

18, 19, 20, 20, 21, 21, 22, 22, 23, 45

Which conclusion about the data is best supported?

The outlier mainly changes the IQR because it is far from the rest of the data.
Most hours have about 20–23 customers, and the hour with 45 customers is an unusual high value that pulls the mean up more than the median. (correct answer)
A typical hour has around 45 customers because that value is the largest and stands out.
The data are left-skewed because one hour had far fewer customers than the rest.

Question 22

The median is 22 minutes, and the range is 20 minutes.
The median is 18 minutes, and the range is 20 minutes.
The median is 20 minutes, and the range is 20 minutes. (correct answer)
The median is 20 minutes, and the range is 15 minutes.

Question 23

A fitness app recorded the time (in minutes) it took two groups to complete the same 2-mile walk.

Data Set A (Group A times): 22, 23, 24, 24, 25, 25, 26, 26, 27, 28, 60 Data Set B (Group B times): 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 30

Data Set A has a clear high outlier (60 minutes), while Data Set B does not. Which statement best compares the typical value and variability using appropriate measures (median and IQR)?

Using median and IQR, Group A has a lower typical time and similar variability to Group B. (correct answer)
Using median and IQR, Group A has a higher typical time and a larger variability than Group B.
Using median and IQR, Group B has a lower typical time and a larger variability than Group A.
Using mean and standard deviation, Group A has a higher typical time and much greater variability than Group B.

Question 24

It proves that defects are caused by a specific machine in the factory
It guarantees the sample proportion of defective bulbs equals the true population proportion
It reduces bias so the tested bulbs are more likely to represent all bulbs produced this week (correct answer)
It assigns each bulb to be defective or not defective at random

Question 25

0.78
0.66 (correct answer)
0.54
0.16