Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

Opening subject page...

Loading your content

Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

AP Statistics

AP Statistics Practice Test: Practice Test 37

Practice Test 37 for AP Statistics: real questions and explanations from the Varsity Tutors practice-test pool.

0 / 25 answered

Question 1 of 25

Scores on a certain statewide math assessment for the population of 10th graders are approximately Normal with mean $\mu=500$ and standard deviation $\sigma=100$ . The marked value is $x=350$ . Which statement about the marked value is correct?

Question Navigator

All questions

Question 1

A score of 350 is about 1.5 standard errors below the mean, so it is relatively unusual for an individual student.
A score of 350 is about 1.5 standard deviations below the mean, so it is somewhat unusual for an individual student. (correct answer)
A score of 350 is about 1.5 standard deviations above the mean, so it is somewhat unusual for an individual student.
A score of 350 is about 0.35 standard deviations below the mean, so it is very typical for an individual student.
A score of 350 is about 3.5 standard deviations below the mean, so it is extremely unusual for an individual student.

Explanation: This question requires calculating how many standard deviations a test score is from the mean. With μ = 500 and σ = 100, we calculate the z-score for x = 350: z = (350 - 500)/100 = -150/100 = -1.5. This means 350 is 1.5 standard deviations below the mean. The distractors include confusing the direction (above vs. below), miscalculating the z-score, or mixing up standard deviations with standard errors. In a normal distribution, values between 1 and 2 standard deviations from the mean are somewhat unusual but not extremely rare - they occur in about 13.5% of the population on each tail.

Question 2

A machine is designed to randomly sort identical balls into 2 bins, Left (L) or Right (R), with equal probability each time. A supervisor records the results for 20 balls and sees: L, R, L, R, L, R, L, R, L, R, L, R, L, R, L, R, L, R, L, R. Under random behavior, you would expect some short runs and occasional repeats, not perfect alternation for a long stretch. Is the pattern consistent with random behavior?

Yes, because alternation is the most random-looking pattern and therefore most likely.
No, because perfect alternation for 20 trials is suspiciously regular for randomness. (correct answer)
Yes, because there are 10 L and 10 R, which proves the process is random.
No, because random behavior would require all L’s first and then all R’s.
Yes, because random sequences never have repeated outcomes, and this one has none.

Explanation: This question tests recognizing overly regular patterns in binary outcomes, a core randomness skill in statistics. Random sorting into L or R should produce some repeats and short runs, not unbroken alternation, as perfect switching for 20 trials is rare and suggests programming or bias. The L-R pattern is too orderly for chance. Distractor A incorrectly deems alternation 'most random-looking,' but humans often mistake regularity for randomness, while true randomness includes clumps. Mini-lesson: in random sequences, independence leads to variable runs, and excessive order like perfect alternation has low probability, highlighting that randomness is 'lumpy' not smooth. Thus, the pattern is inconsistent with random behavior.

Question 3

A game app records how many hints a randomly selected player uses to finish a level. Let $X$ be the number of hints, a discrete random variable with distribution: $P(X=0)=0.25$ , $P(X=1)=0.40$ , $P(X=2)=0.20$ , $P(X=3)=0.10$ , $P(X=4)=0.05$ . The mean is $\mu_X=1.30$ hints and the standard deviation is $\sigma_X=1.17$ hints. Which interpretation of the mean is correct?

Most players use exactly 1.30 hints.
In repeated play across many players, the average number of hints used will be about 1.30. (correct answer)
About 1.30 hints is the typical distance from the mean.
A mean of 1.30 hints means players use between 0.13 and 1.30 hints.
A mean of 1.30 hints means the range of hints is 1.30.

Explanation: This question tests understanding of the mean for a discrete probability distribution. The mean μ = 1.30 hints represents the expected value or long-run average number of hints used per player. Choice B correctly states that across many players, the average number of hints will be about 1.30. Choice A incorrectly claims most players use exactly 1.30 hints (impossible for discrete data). Choice C describes standard deviation, not the mean. Choice D incorrectly creates a range. Choice E incorrectly equates the mean with the range. The mean is the weighted average of all possible values, representing what we expect on average over many observations.

Question 4

An economist models the relationship between a country's GDP and its average life expectancy using a linear regression model on data from a sample of countries. After fitting the line, she observes that the residuals are all positive for low and high GDPs and negative for mid-range GDPs. What does this pattern suggest about the linear model?

A linear model may not be appropriate because the variation of points around the line is non-random. (correct answer)
The relationship is linear, but the sample size was too small to accurately estimate the true slope.
The association between GDP and life expectancy is negative, which contradicts the model's assumptions.
The variation of points around the regression line is purely random, which is expected in a good model.

Explanation: A distinct pattern in the residual plot, such as a curve, indicates that the relationship between the variables is likely non-linear. The variation of points around the line is therefore not random, and a simple linear model is probably not the best fit for the data.

Question 5

A city’s transportation office claims that commuters use the following primary modes of transportation: 50% drive alone, 20% carpool, 15% public transit, 10% bike, and 5% walk. A random sample of 400 commuters is taken and the observed counts are shown below.

Which hypotheses are appropriate for a chi-square goodness-of-fit test of the office’s claim?

$H_0$ : Mode of transportation is independent of commuter; $H_a$ : Mode of transportation is not independent of commuter.
$H_0$ : The sample proportions are 0.50, 0.20, 0.15, 0.10, 0.05 for the five modes; $H_a$ : The sample proportions differ from these values.
$H_0$ : The population distribution of primary commute mode is 0.50 drive alone, 0.20 carpool, 0.15 public transit, 0.10 bike, 0.05 walk; $H_a$ : The population distribution is not as claimed. (correct answer)
$H_0$ : $p_{drive}=p_{carpool}=p_{transit}=p_{bike}=p_{walk}$ ; $H_a$ : At least one proportion differs.
$H_0$ : The observed counts match the claim exactly; $H_a$ : The observed counts do not match the claim exactly.

Explanation: This question tests understanding of chi-square goodness-of-fit hypothesis setup for transportation mode claims. The null hypothesis should state the claimed population distribution of commute modes (50% drive alone, 20% carpool, 15% public transit, 10% bike, 5% walk). Option B incorrectly refers to sample proportions - we test population parameters, not sample statistics. Option C correctly states the null hypothesis about the population distribution matching the claim, with the alternative that it differs. Options about independence or equal proportions are inappropriate for goodness-of-fit tests, which specifically test whether data fits a claimed distribution.

Question 6

A city council wants to know whether a new text-message reminder system increased the proportion of residents who pay parking tickets within 14 days. Before the system, 180 of 400 randomly selected ticketed residents paid within 14 days; after the system, 210 of 400 randomly selected ticketed residents paid within 14 days. A two-proportion $z$ test for $H_0: p_{\text{before}}-p_{\text{after}}=0$ versus $H_a: p_{\text{before}}-p_{\text{after}}<0$ produced a p-value of $0.006$ . Using $\alpha=0.05$ , what conclusion is appropriate?

Reject $H_0$ ; there is convincing evidence that the population proportion paying within 14 days increased after the reminder system. (correct answer)
Fail to reject $H_0$ ; there is not convincing evidence that the population proportion paying within 14 days increased after the reminder system.
Reject $H_0$ ; there is convincing evidence that the population proportion paying within 14 days decreased after the reminder system.
Because the p-value is small, the reminder system caused exactly a 7.5% increase in paying within 14 days.
Reject $H_0$ ; we can conclude only that 210/400 is greater than 180/400 for these samples, not for the populations.

Explanation: This problem tests whether the reminder system increased the proportion paying within 14 days. The alternative hypothesis Ha: p_before - p_after < 0 is equivalent to p_before < p_after, meaning we're testing if the proportion increased after the system. The p-value (0.006) is less than α = 0.05, so we reject H₀. This provides convincing evidence that the population proportion paying within 14 days increased after the reminder system was implemented. Choice D incorrectly claims an exact percentage increase and confuses sample statistics with population parameters. Choice E incorrectly limits conclusions to samples only. When testing for an increase with Ha: p₁ - p₂ < 0, rejecting H₀ supports that the second proportion is larger than the first.

Question 7

The following is a set of exam scores for a small class of students: 35, 72, 75, 78, 81, 83, 85, 88.

Which value in this dataset is best described as a potential outlier?

35, because it is unusually low compared to the other scores which are clustered together. (correct answer)
88, because it is the maximum value and is therefore an extreme point in the dataset.
79.5, the mean of the dataset, because it represents the central tendency.
72, because it is the second lowest score and establishes the beginning of the main cluster.

Explanation: The correct answer is A. The majority of the scores (72, 75, 78, 81, 83, 85, 88) are clustered in the 70s and 80s. The score of 35 is significantly lower than this cluster, making it a potential outlier. Being the maximum or minimum value does not automatically qualify a point as an outlier; the separation from the rest of the data is the key factor.

Question 8

A student’s total time spent on homework in a week is modeled by two independent random variables. Let $X$ be time (in hours) spent on math homework with $\mu_X=5$ and $\sigma_X=1.5$ . Let $Y$ be time (in hours) spent on English homework with $\mu_Y=3$ and $\sigma_Y=1.0$ . Define $S=X+Y$ , the total time on math and English homework. Which statement about the combined variable is correct?

$\mu_S=8$ and $\sigma_S=2.5$
$\mu_S=8$ and $\sigma_S=\sqrt{3.25}$ (correct answer)
$\mu_S=2$ and $\sigma_S=\sqrt{3.25}$
$\mu_S=8$ and $\sigma_S=\sqrt{0.5}$
$\mu_S=15$ and $\sigma_S=\sqrt{3.25}$

Explanation: This question evaluates combining independent random variables for the total homework time S = X + Y in AP Statistics. The mean is μ_S = 5 + 3 = 8 hours. The variance is σ_S² = (1.5)² + (1.0)² = 2.25 + 1 = 3.25, so σ_S = √3.25. Choice A distracts by adding the standard deviations (1.5 + 1.0 = 2.5) instead of variances. In a mini-lesson, for independent variables, the sum's mean is additive, but variance combines as the sum of individual variances, then take the square root for standard deviation. This applies to any linear combination, scaling variances by the squares of coefficients.

Question 9

A jar contains 200 beads, 40 of which are red. A student randomly draws 20 beads with replacement and records whether each bead is red. A success is “red,” and exactly 20 draws are made. Does this meet binomial conditions?

Yes, because $n$ is fixed, outcomes are red/not red, $p=40/200$ is constant, and draws are independent due to replacement. (correct answer)
No, because the population size is finite.
No, because there are more than two colors.
No, because replacement makes trials dependent.
Yes, but only if $n$ is larger than 30.

Explanation: This question tests binomial with replacement sampling. Fixed n=20, binary (red or not), constant p=40/200, and independence via replacement make choice A correct. Distractor C notes multiple colors, but we binarize; choice D wrongly claims replacement causes dependence—it's the opposite. Choice B dismisses finite population, but replacement fixes that. Mini-lesson: With replacement, sampling mimics infinite population, perfect for binomial; without, consider if n << population for approximation.

Question 10

A city transit agency reports that the mean wait time for a bus on a certain route is 12 minutes. A commuter advocacy group randomly samples $n=35$ bus arrivals and records wait times, obtaining a sample mean of $\bar{x}=13.1$ minutes. The group wants to test whether the true mean wait time is different from 12 minutes. Which hypotheses are appropriate?

$H_0: \mu=12$ vs. $H_a: \mu\ne 12$ (correct answer)
$H_0: \mu\ne 12$ vs. $H_a: \mu=12$
$H_0: \bar{x}=12$ vs. $H_a: \bar{x}\ne 12$
$H_0: p=12$ vs. $H_a: p\ne 12$
$H_0: \mu=13.1$ vs. $H_a: \mu\ne 13.1$

Explanation: This question tests setting up a two-tailed hypothesis test for a population mean. The transit agency reports a mean of 12 minutes, giving us H₀: μ = 12. Since the group wants to test if the mean is 'different from' 12 minutes (not specifically higher or lower), we need a two-tailed test: Hₐ: μ ≠ 12. Option B incorrectly reverses the null and alternative hypotheses. Option C uses x̄ (sample mean) instead of μ (population mean) in the hypotheses. Option D uses p, which is for proportions, not means. Option E incorrectly uses the sample mean value (13.1) in the hypotheses—we test the claimed value, not the observed value. For 'different from' questions, always use ≠ in the alternative hypothesis.

Question 11

Let $F$ be the amount of time (in hours) a phone battery lasts after a full charge with Feature A turned on, and let $G$ be the amount of time (in hours) it lasts with Feature A turned off, for randomly selected days. Over many days, $\mu_F=9.5$ with $\sigma_F=1.1$ , and $\mu_G=11.0$ with $\sigma_G=1.4$ . Assume $F$ and $G$ are independent across days. Define $H=G-F$ , the increase in battery life when the feature is off. Which statement about $H$ is correct?

$\mu_H=1.5$ and $\sigma_H=1.4-1.1=0.3$ .
$\mu_H=-1.5$ and $\sigma_H=\sqrt{1.1^2+1.4^2}$ .
$\mu_H=20.5$ and $\sigma_H=\sqrt{1.1^2+1.4^2}$ .
$\mu_H=1.5$ and $\sigma_H=1.1+1.4=2.5$ .
$\mu_H=1.5$ and $\sigma_H=\sqrt{1.1^2+1.4^2}$ . (correct answer)

Explanation: This problem examines the increase in battery life when a feature is turned off. For H = G - F, the mean is μ_H = μ_G - μ_F = 11.0 - 9.5 = 1.5 hours (positive because battery lasts longer without the feature). Since F and G are independent across days, their variances add: σ_H² = σ_F² + σ_G² = 1.1² + 1.4² = 1.21 + 1.96 = 3.17, giving σ_H = √3.17 = √(1.1² + 1.4²). Choice E correctly shows μ_H = 1.5 and σ_H = √(1.1² + 1.4²). Common errors include subtracting standard deviations (Choice A: 1.4 - 1.1 = 0.3) or adding them directly (Choice D: 1.1 + 1.4 = 2.5). Remember: for independent variables, variances always add when combining, whether by addition or subtraction.

Question 12

A marketing team is studying which social media platform customers most associate with a brand. From the population of customers who follow the brand online, a random sample of 140 customers is surveyed, and each customer selects the platform they most associate with the brand (a categorical variable). The distribution is shown below.

Which statement is supported by the data?

Platform most associated with the brand	Frequency	Relative frequency
Instagram	56	0.400
TikTok	42	0.300
YouTube	28	0.200
Facebook	14	0.100

TikTok is the least common platform in the sample.
Instagram is the most common platform in the sample, with 56 respondents (40%). (correct answer)
YouTube is associated with the brand by about 28% of respondents.
Facebook and YouTube together account for about 10% of respondents.
More respondents selected Facebook than YouTube.

Explanation: The skill evaluated is representing a categorical variable with tables, interpreting frequency and relative frequency for social media platforms associated with a brand in 140 customers. Choice B is supported, with Instagram at 56 respondents and 0.400 or 40%, the most common. Evidence includes 56/140 = 0.4, outperforming others. A key distractor is choice C, claiming YouTube at about 28%, but 28 is frequency, not the 20% relative. Choice D errs by adding Facebook and YouTube to 10%, when it's 30%. A transferable mini-lesson for categorical tables is to use frequencies for counts and relative frequencies for percentages, verifying combinations by summing proportions to check aggregate claims accurately.

Question 13

A manufacturer compares the mean lifetime of batteries from two production lines. In repeated sampling, an independent random sample of $n_1=16$ batteries from Line 1 and $n_2=16$ batteries from Line 2 is tested, and $\bar{x}_1-\bar{x}_2$ is recorded. Which statement is correct about how the standard deviation of the sampling distribution changes if both sample sizes are quadrupled (to $n_1=n_2=64$ )?

It becomes 4 times as large because the sample sizes are 4 times as large.
It becomes about half as large because each standard error term involves $\sqrt{n}$ . (correct answer)
It stays the same because the population standard deviations do not change.
It becomes 4 times smaller because the sample sizes are 4 times as large.
It becomes 2 times larger because there are two groups instead of one.

Explanation: This question examines how sample size affects the standard deviation of the sampling distribution. When both sample sizes are quadrupled from 16 to 64, each term in the standard deviation formula $\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$ is divided by 4, making the entire expression half as large (since $\sqrt{1/4} = 1/2$ ). Option A incorrectly suggests it increases. Option C is wrong - sample size does affect spread. Option D has the wrong factor. Option E makes no sense in this context.

Question 14

A farmer records the weights (in pounds) of 35 pumpkins harvested from one field. The five-number summary is: min $=4.8$ , $Q_1=7.2$ , median $=9.0$ , $Q_3=11.5$ , max $=18.6$ . A standard boxplot is drawn. Which feature is consistent with the summary statistics?

The box extends from 7.2 lb to 11.5 lb, and the median line is at 9.0 lb. (correct answer)
The range is $11.5-7.2=4.3$ lb.
The IQR is $18.6-4.8=13.8$ lb.
The median must be 9.35 lb because it is halfway between $Q_1$ and $Q_3$ .
The whiskers end at 7.2 lb and 11.5 lb because whiskers show the middle 50% of the data.

Explanation: This question tests understanding of standard boxplot construction. In any boxplot, the box extends from Q₁ to Q₃ with the median line positioned inside according to the data. Given Q₁ = 7.2, Q₃ = 11.5, and median = 9.0, the box extends from 7.2 to 11.5 pounds with the median at 9.0 pounds, making choice A correct. Choice B incorrectly identifies the IQR as the range; IQR = 11.5 - 7.2 = 4.3 lb, while range = 18.6 - 4.8 = 13.8 lb. Choice C incorrectly calculates IQR using max - min. Choice D incorrectly assumes the median must be at the arithmetic mean of Q₁ and Q₃; (7.2 + 11.5)/2 = 9.35, but the actual median is 9.0. Choice E confuses the box boundaries with whisker endpoints; whiskers show the full data spread, not just the middle 50%.

Question 15

A school district compares mean number of absences in a semester for students who start school at 8:00 a.m. versus students who start at 9:00 a.m. A random sample of 40 students with an 8:00 start had a mean of 6.1 absences, and a random sample of 42 students with a 9:00 start had a mean of 5.4 absences. The district’s claim is that the mean number of absences differs between the two start times. Which hypotheses are appropriate?

$H_0: \mu_{8}-\mu_{9}=0$ ; $H_a: \mu_{8}-\mu_{9}>0$
$H_0: \mu_{8}-\mu_{9}=0$ ; $H_a: \mu_{8}-\mu_{9}\ne 0$ (correct answer)
$H_0: \bar{x}_{8}-\bar{x}_{9}=0$ ; $H_a: \bar{x}_{8}-\bar{x}_{9}\ne 0$
$H_0: \mu_{9}-\mu_{8}=0$ ; $H_a: \mu_{9}-\mu_{8}>0$
$H_0: \mu_{9}=5.4$ ; $H_a: \mu_{9}\ne 5.4$

Explanation: This question requires setting up a two-sided hypothesis test since the district claims the mean number of absences differs between start times without specifying which is higher. The null hypothesis states no difference: μ_8 - μ_9 = 0. The alternative hypothesis uses ≠ to indicate any difference: μ_8 - μ_9 ≠ 0. Choice B correctly uses population parameters with a two-sided alternative. Choice C incorrectly uses sample means instead of population means, and choice E incorrectly tests against a specific value. When a claim mentions a difference without specifying direction, always use a two-tailed test with ≠ in the alternative hypothesis.

Question 16

A teacher studies whether the number of practice problems completed ( $x$ ) predicts score on a quiz ( $y$ ). The teacher claims that completing more practice problems increases quiz scores in the population. Which hypotheses are appropriate for testing this claim about the slope in the regression of $y$ on $x$ ?

$H_0: \beta=0$ vs. $H_a: \beta>0$ (correct answer)
$H_0: b=0$ vs. $H_a: b>0$
$H_0: r=0$ vs. $H_a: r>0$
$H_0: \rho=0$ vs. $H_a: \rho>0$
$H_0: \beta=0$ vs. $H_a: \beta<0$

Explanation: This question assesses hypothesis formulation for regression slopes in AP Statistics, focusing on positive educational impacts. The teacher claims more practice increases scores, suggesting positive β, so H0: β = 0 vs. Ha: β > 0 aligns with choice A. Distractor choice B employs sample b, but population β is required. Choice E proposes negative slope, conflicting with the increase claim. Mini-lesson: Slope tests evaluate if β = 0 (no effect) vs. directional Ha, like > 0 for improvements. Unlike correlation (ρ, r), slope quantifies change per unit. Ensure directional match to avoid errors.

Question 17

A fitness coach compares mean change in resting heart rate (in beats per minute) after 8 weeks for two workout plans: Plan X and Plan Y. A random sample of 20 clients on Plan X had a mean change of −4.1 bpm, and a random sample of 22 clients on Plan Y had a mean change of −2.7 bpm. The research claim is that the mean change differs between the two plans. Which hypotheses are appropriate?

$H_0: \mu_X-\mu_Y=0$ ; $H_a: \mu_X-\mu_Y\ne 0$ (correct answer)
$H_0: \mu_X-\mu_Y=0$ ; $H_a: \mu_X-\mu_Y<0$
$H_0: \bar{x}_X-\bar{x}_Y=0$ ; $H_a: \bar{x}_X-\bar{x}_Y\ne 0$
$H_0: \mu_Y-\mu_X=0$ ; $H_a: \mu_Y-\mu_X>0$
$H_0: \mu_X=-4.1$ ; $H_a: \mu_X\ne -4.1$

Explanation: This AP Statistics question involves hypotheses for differing mean changes in two groups, requiring a two-sided test: Ha: μ_X - μ_Y ≠ 0, as in choice A. It properly uses population parameters. A common distractor, choice C, substitutes sample means for μ. Choice E tests one mean against a constant, not the comparison. Two-mean setup mini-lesson: define μ_X and μ_Y, H0: μ_X - μ_Y = 0, Ha ≠ 0 for claims of difference without specified direction.

Question 18

An ecologist claims that a certain species of beetle has four color morphs occurring in proportions 0.10, 0.20, 0.30, and 0.40. A random sample of 200 beetles is captured and each beetle is classified into one of the four morph categories (categorical). Which inference procedure is most appropriate to test whether the observed counts are consistent with the claimed distribution?

Chi-square goodness-of-fit test (correct answer)
Chi-square test of independence (color morph by habitat type)
One-proportion $z$ test (test $p=0.10$ for one morph only)
One-sample $t$ test (test mean number of spots per beetle)
Two-proportion $z$ interval (estimate difference in two morph proportions)

Explanation: This question tests your ability to recognize when to use a chi-square goodness-of-fit test. The scenario involves one categorical variable (color morph) with four categories and a claimed theoretical distribution (0.10, 0.20, 0.30, 0.40). We want to test whether observed counts from our sample are consistent with these claimed proportions. The chi-square goodness-of-fit test is specifically designed for this purpose - comparing observed frequencies to expected frequencies based on a hypothesized distribution. This differs from a test of independence (which requires two variables) or proportion tests (which typically involve one or two categories, not four). The test will determine if the observed morph frequencies deviate significantly from the ecologist's claimed distribution.

Question 19

An online retailer studies the amount spent per order. The population distribution is strongly right-skewed because most orders are small but a few are very large. The analyst repeatedly takes random samples of $n=100$ orders and computes the sample mean amount spent $\bar{x}$ . She examines the sampling distribution of $\bar{x}$ . Why is the sampling distribution approximately normal?

Because the Central Limit Theorem says the sampling distribution of $\bar{x}$ is approximately normal for large $n$ , regardless of the population shape (correct answer)
Because the population distribution must be normal for the sampling distribution of $\bar{x}$ to be approximately normal
Because $\bar{x}$ is based on 100 observations, it must be exactly normal
Because skewness in the population disappears only when $n=2$
Because the sampling distribution of $\bar{x}$ has the same skewness as the population, which is close to normal here

Explanation: In AP Statistics, this question probes the Central Limit Theorem's effect on the sampling distribution of the sample mean. With a strongly right-skewed population of order amounts, the large sample size of n=100 invokes the CLT, making the distribution of x approximately normal. The CLT states that regardless of the population shape, large n leads to a normal sampling distribution for the mean. Choice B is a distractor, insisting on a normal population, but the CLT applies broadly. For a mini-lesson on the CLT: consider that as n grows, the standard error decreases, and the distribution symmetrizes around the mean, allowing reliable approximations with normal curves for inference purposes.

Question 20

A school district wants to estimate the proportion of all parents in the district who are satisfied with remote learning options. The district posted a survey link on its website and social media accounts and received 1200 responses; 68% reported being satisfied. The district concluded that “about 68% of all parents in the district are satisfied with remote learning options.” Which statement explains whether the conclusion is valid?

The conclusion is valid because 1200 responses is a very large sample size.
The conclusion is not valid because the survey is a voluntary response sample and may overrepresent parents with strong opinions or higher engagement. (correct answer)
The conclusion is valid because posting online ensures a random selection of parents who see the link.
The conclusion is not valid because satisfaction with remote learning proves remote learning improves student grades.
The conclusion is not valid because 68% is less than 70%, so it is not statistically meaningful.

Explanation: Assessing voluntary response sampling and its biases is the key skill in this AP Statistics question, highlighting how data collection methods determine the reliability of conclusions. The district's online survey allows self-selection, likely attracting parents with strong views or higher engagement, overrepresenting satisfied or dissatisfied groups and not reflecting all parents. A distractor such as choice A wrongly prioritizes sample size over method, ignoring that large voluntary samples can still be biased. Trustworthy conclusions demand random sampling to minimize self-selection bias, ensuring every parent has an equal chance to participate. This conclusion is invalid because of the voluntary nature, providing a mini-lesson that engagement-driven responses can skew results, and randomization is crucial for representativeness.

Question 21

A study recorded commute method (Car or Public transit) and whether commuters arrive late at least once per week (Yes or No). The results are shown in the table. Which comparison is appropriate for interpreting the relationship between commute method and arriving late using conditional distributions?

Compare the overall percent of commuters who arrive late at least once per week to the overall percent who do not.
Compare the percent who arrive late at least once per week among car commuters to the percent who arrive late at least once per week among public-transit commuters. (correct answer)
Compare the number of late commuters who use public transit to the number of late commuters who use a car.
Compare the percent of public-transit commuters among those who arrive late at least once per week to the percent of public-transit commuters among those who do not.
Compare the overall percent who commute by car to the overall percent who commute by public transit.

Explanation: This question asks about comparing conditional distributions to interpret the relationship between commute method and arriving late. To determine if these variables are associated, we should compare the proportion of commuters who arrive late within each commute method - that is, the percent who arrive late at least once per week among car commuters versus the percent who arrive late among public-transit commuters. Choice B correctly identifies this comparison of conditional distributions. Options A and E examine marginal distributions, C compares raw counts rather than proportions, and D conditions on lateness instead of commute method. The key insight is to condition on the explanatory variable (commute method) and examine how the response variable (arriving late) differs across those conditions.

Question 22

A school compares two study programs by repeatedly taking random samples of students from each program. Each time, a random sample of $n_1=40$ students from Program A and an independent random sample of $n_2=40$ students from Program B are selected, and the mean exam score is computed for each group. The statistic of interest is $\bar{x}_A-\bar{x}_B$ . Which statement is correct about the sampling distribution of $\bar{x}_A-\bar{x}_B$ ?

It is centered at $\mu_A-\mu_B$ , and its standard deviation decreases when either $n_1$ or $n_2$ increases. (correct answer)
It is centered at $0$ whenever the two sample sizes are equal.
It has no variability because the same two programs are being compared each time.
Its standard deviation is $\sigma_A-\sigma_B$ because the statistic is a difference.
It describes the distribution of individual score differences between one student from A and one student from B.

Explanation: This question tests understanding of the sampling distribution of differences in sample means. The sampling distribution of $\bar{x}_A-\bar{x}_B$ is centered at the difference in population means, $\mu_A-\mu_B$ , because each sample mean is an unbiased estimator of its population mean. The standard deviation of this distribution is $\sqrt{\frac{\sigma_A^2}{n_1} + \frac{\sigma_B^2}{n_2}}$ , which decreases as either sample size increases. Option B is incorrect because the center depends on population means, not sample sizes. Option C is wrong because sampling variability exists even when comparing the same programs repeatedly. Option D incorrectly states the standard deviation formula. Option E confuses the sampling distribution with individual differences.

Question 23

A basketball player makes 70% of free throws in the long run. During one practice, she shoots 20 free throws and makes 10. She expected about 14 makes but observed 10. Is the result unexpected, given natural variability in 20 shots?

Yes; making only 10 out of 20 would be surprisingly low if her true rate is 70%. (correct answer)
No; because she can miss, any number of makes from 0 to 20 is equally likely.
No; 10 makes is below 14, but with 20 shots that amount of variation is not unusual.
Yes; it is impossible to make only 10 if she is truly a 70% shooter.
No; the result cannot be judged because the sample size is not at least 100.

Explanation: This problem tests the ability to evaluate if results are unexpected given binomial variability, a key concept in AP Statistics' introduction to statistics. For n=20 free throws with p=0.70, the expected makes are 14, with a standard deviation of about 2.05, meaning typical outcomes fall between roughly 10 and 18. Getting only 10 makes is nearly 2 standard deviations below the mean, which is unusually low and unexpected. Choice C, a distractor, claims that variation in 20 shots isn't unusual for this deviation, but calculations show the probability is around 5% or less, making it surprising. Mini-lesson: To judge surprise, compare the observed count to the mean np; if it's beyond about 2 standard deviations (sqrt(np(1-p))), it's unexpected, as such results occur in fewer than 5% of similar trials by chance.

Question 24

A state health department wants to estimate the mean number of hours of exercise per week for all adults in the state. The department divides the state into urban and rural counties, then randomly selects 50 adults from urban counties and 50 adults from rural counties using voter registration lists. Which statement best describes the sample representativeness?

The sample is likely more representative than a convenience sample because it uses random selection within urban and rural strata, though it may miss adults not on voter lists. (correct answer)
The sample is not representative because stratified sampling is never representative.
The sample is representative of all adults because it includes both urban and rural counties in equal numbers, regardless of their population sizes.
The sample is representative because adults were randomly assigned to be in the urban or rural group.
The sample is representative because the total sample size is 100, which guarantees representativeness.

Explanation: This question tests understanding of stratified random sampling and its limitations. The health department correctly uses stratified sampling by dividing the population into urban and rural strata and randomly selecting within each. This ensures representation from both groups and can increase precision if exercise habits differ between urban and rural areas. However, using voter registration lists as the sampling frame excludes adults who aren't registered to vote, which could bias results if non-registered adults have different exercise patterns. While stratified random sampling is generally more representative than convenience sampling, the coverage error from the incomplete frame prevents perfect representativeness. The key principle is that random sampling methods improve representativeness but can still have limitations based on the sampling frame used.

Question 25

A university estimates that 50% of applicants submit test scores. A counselor repeatedly takes many random samples of $n=200$ applicants and computes $\hat p$ . Which statement about the sampling distribution is correct?

The sampling distribution of $\hat p$ is approximately Normal because $np$ and $n(1-p)$ are both at least 10. (correct answer)
The sampling distribution of $\hat p$ is exactly Normal for any sample size.
The sampling distribution of $\hat p$ cannot be approximately Normal because $\hat p$ is a proportion.
The sampling distribution of $\hat p$ is approximately Normal only when $p$ is close to 0 or 1.
The sampling distribution of $\hat p$ is approximately Normal only if $\hat p$ equals $p$ in each sample.

Explanation: The skill is conditions for normality in sampling distributions of proportions in AP Statistics. With np=100, n(1-p)=100 >=10, approx Normal, making A correct. Distractor B says exactly Normal for any n, but it's approximate. Mini-lesson: Distribution of hat p approx Normal if np, n(1-p) >=10 and n small relative to population; mean p, spread sqrt{p(1-p)/n}.