Introducing Statistics: Learning from Data
Help Questions
AP Statistics › Introducing Statistics: Learning from Data
A city transportation department selects a simple random sample of 120 intersections from all intersections in the city and records whether each has a functioning pedestrian “walk” signal. In the sample, 102 intersections have a functioning signal. Which conclusion is supported by these data?
Exactly 85% of the city’s intersections have a functioning pedestrian walk signal.
Installing walk signals causes intersections to be safer, so the city should install more.
Because the sample is random, every intersection in the city has a functioning walk signal with probability 0.85.
About 85% of intersections in all cities have functioning pedestrian walk signals.
About 85% of the city’s intersections have a functioning pedestrian walk signal.
Explanation
Learning from data in statistics involves making inferences from random samples to populations. The simple random sample of 120 intersections finds 85% functioning, allowing an approximate inference to all city intersections. Conclusions must match the data's scope, estimating for the city's intersections without exactness, causation, or extension to other cities. A lesson is that random sampling supports population estimates like 'about 85%,' but not precise probabilities per unit or unrelated recommendations. Avoid causal claims from descriptive data. Choice A correctly estimates for the city's intersections. Others add exactness, causation, or overreach.
A researcher wants to study the relationship between weekly exercise time and resting heart rate among adults who belong to a particular gym. She records data from 60 gym members who volunteer to participate. The data show a negative association: members who report more weekly exercise tend to have lower resting heart rates. Which conclusion is supported by the data?
The negative association proves that exercise time is the only factor affecting resting heart rate.
Increasing weekly exercise time will cause an adult’s resting heart rate to decrease.
Adults in general have lower resting heart rates when they exercise more, based on this study.
Among the 60 volunteers from this gym, higher reported exercise time tended to be associated with lower resting heart rate.
Because 60 people were measured, the relationship must be strong and linear for all gym members.
Explanation
This observational study examines the relationship between exercise time and heart rate among gym volunteers. Option A correctly describes what the data shows: among these 60 volunteers, more exercise was associated with lower heart rates. We cannot claim causation (B) from observational data, generalize to all adults (C), or make absolute statements about the relationship (D, E). The key limitation is that participants were volunteers from one gym, not a random sample. When describing associations from observational studies, use tentative language and limit conclusions to the studied group.
A streaming service wants to estimate the average number of hours per week its subscribers watch content. It takes a simple random sample of 500 subscribers from its current subscriber list and computes a sample mean of 9.4 hours per week. Which conclusion is supported by the data?
Because the sample is random, every subscriber watches exactly 9.4 hours per week on average.
For the 500 sampled subscribers, the average viewing time was 9.4 hours per week.
The sample mean proves that most subscribers watch between 9 and 10 hours per week.
Watching 9.4 hours per week causes people to subscribe to the streaming service.
The average viewing time for all people (subscribers and non-subscribers) is about 9.4 hours per week.
Explanation
This question tests understanding of what sample statistics represent. The streaming service sampled 500 subscribers and found a mean of 9.4 hours. Option B correctly states what this means: the average for the 500 sampled subscribers was 9.4 hours. We cannot extend this to non-subscribers (A), claim every subscriber watches this exact amount (C), reverse causation (D), or make claims about the distribution (E). Sample statistics describe the sample itself; inference to the population requires additional statistical reasoning. Always distinguish between what you observed in your sample versus claims about the population.
A company wants to estimate the average time its customer service phone calls last. It takes a random sample of 80 calls made last Tuesday and finds a mean duration of 6.4 minutes. Which conclusion is supported by the data?
The random sample proves that call durations are normally distributed.
Shortening the call script will cause the mean call duration to be 6.4 minutes.
The mean call duration for all calls made in the entire year is about 6.4 minutes.
The mean call duration for all calls made last Tuesday was about 6.4 minutes.
The mean call duration for all calls to the company is exactly 6.4 minutes.
Explanation
In introductory statistics, this question evaluates estimating population parameters from sample data and scope of inference. The random sample of 80 calls from last Tuesday yields a mean of 6.4 minutes, supporting an estimate for all calls that day, not broader periods or distributions. The conclusion must connect to the sampled population without overextending to all calls ever or claiming causation. A key lesson is that random samples allow approximate inferences to the population from which they were drawn, like one day's calls, but not exact values or unrelated claims like normality. Avoid causal statements from descriptive data. Choice B properly estimates for last Tuesday's calls. Others generalize too far or misinterpret.
A school district surveyed a simple random sample of 200 students from its 4 high schools about whether they usually eat breakfast on school days. In the sample, 118 students said “yes.” Which conclusion is supported by these data?
About 59% of students in the district’s 4 high schools usually eat breakfast on school days.
About 59% of all teenagers in the state usually eat breakfast on school days.
Exactly 59% of all students in the district’s 4 high schools usually eat breakfast on school days.
About 59% of the district’s teachers usually eat breakfast on school days.
Eating breakfast causes higher grades for students in the district.
Explanation
This question assesses the skill of drawing appropriate conclusions from sample data in introductory statistics, focusing on the scope of inference. The data come from a simple random sample of 200 students from the district's 4 high schools, with 118 saying they eat breakfast, which is 59%. The supported conclusion must match the data's limitations, inferring to the population of students in those schools without overgeneralizing or claiming causation. A mini-lesson here is that when data are from a random sample, we can make approximate inferences to the sampled population, but not exact statements or extensions to broader groups like all teenagers or teachers. We also avoid causal claims unless there's experimental evidence. Thus, choice A correctly estimates about 59% for the district's high school students. Other choices overreach by claiming exactness, causation, or unrelated populations.
A researcher wants to estimate the proportion of residents in a city who support a proposed public transit tax. She stands outside a downtown office building at 8:00 AM for one morning and surveys the first 150 people who enter. Of those surveyed, 72% say they support the tax. Which statement is best supported by these data?
Because 72% is above 50%, a majority of all city residents definitely support the tax.
About 72% of all downtown workers in the city support the tax.
About 72% of people entering that office building around 8:00 AM on that morning supported the tax.
The tax proposal caused 72% of city residents to support public transit.
About 72% of all city residents support the tax because the sample size is large.
Explanation
This question in learning from data highlights recognizing sampling bias and limiting conclusions to the sampled group. The researcher used a convenience sample of 150 people entering a downtown office at 8:00 AM, with 72% supporting the tax, so inferences can't extend to all city residents due to potential bias. The data support conclusions only about the specific group surveyed, not broader populations or causal effects. A mini-lesson is that non-random samples, like convenience ones, may not represent the population, so match conclusions tightly to who was actually surveyed without assuming representativeness. We also avoid definitive majority claims without accounting for sampling error. Thus, choice B correctly limits to people entering that building that morning. Other options incorrectly generalize or imply causation.
A marketing team tests two website layouts. For one week, the site uses Layout 1; the next week, it uses Layout 2. The purchase rate is 3.1% during the Layout 1 week and 3.8% during the Layout 2 week. No other data are collected. Which conclusion is supported by these data?
Because the purchase rate increased, customers preferred Layout 2 and reported higher satisfaction.
The purchase rate was higher during the week when Layout 2 was used than during the week when Layout 1 was used.
The difference in purchase rates must be due to random assignment of visitors to layouts.
Layout 2 will increase purchase rates for all future weeks, regardless of seasonality or promotions.
Layout 2 caused the purchase rate to increase by 0.7 percentage points.
Explanation
This introductory statistics question examines conclusions from non-experimental data over time. The website used Layout 1 one week (3.1% purchases) and Layout 2 the next (3.8%), showing a descriptive difference but not causation due to potential time-based confounds. The supported conclusion describes the observed rates without implying cause or future effects. A mini-lesson is to recognize sequential designs aren't randomized, so stick to descriptive comparisons and avoid causal or preferential claims. Differences don't imply random assignment or ignore external factors. Choice B properly states the higher rate during Layout 2's week. Other choices introduce causation or unfounded predictions.
A city health department randomly selects 250 adults from its list of registered residents and asks whether they have received a flu shot this year. Of those sampled, 148 say “yes.” Which conclusion is supported by the data?
Getting a flu shot causes adults to register as residents in the city.
About $148/250$ of the sampled registered residents reported getting a flu shot this year.
About $148/250$ of all adults in the city got a flu shot this year.
Because the sample was random, exactly $148/250$ of all registered residents got a flu shot.
Adults who did not respond to the survey must be less likely to have gotten a flu shot.
Explanation
This question examines understanding of what survey data can tell us. The health department surveyed 250 randomly selected registered residents and found 148 said they got flu shots. Option A correctly describes what the data shows: the proportion in the sample (148/250). We cannot extend this exact proportion to all adults in the city (B) because the survey only included registered residents. Options C, D, and E make unsupported claims about causation, exact population values, or non-respondents. When interpreting survey data, stick to describing what was actually observed in your sample.
A wildlife biologist wants to estimate the proportion of trout in a lake that are infected with a certain parasite. She catches 80 trout using a net in a shallow cove near the shore and finds 12 infected. Which conclusion is supported by the data?
Because 80 trout were sampled, the sample must be representative of the whole lake regardless of how the trout were caught.
Among the 80 trout caught in the shallow cove, $12/80$ were infected with the parasite.
The parasite infection rate in the shallow cove is exactly $12/80$ for all trout in that cove.
Catching trout in shallow water causes parasite infection.
About $12/80$ of the trout in the entire lake are infected with the parasite.
Explanation
This question highlights the importance of representative sampling. The biologist caught 80 trout from a shallow cove near shore - not a random sample from the entire lake. Option C correctly limits the conclusion to what was observed: 12/80 of the caught trout were infected. We cannot generalize to the whole lake (A) because trout in shallow coves might differ from those in deeper water. Options B, D, and E make unsupported claims about exact rates, causation, or representativeness. When samples aren't random or representative, conclusions must be restricted to the sampled group.
A teacher wants to estimate the average number of hours of sleep per night for students at her school. She surveys the 32 students in her first-period class and finds a sample mean of 6.7 hours. Which conclusion is supported by the data?
The average sleep for all students at the school is exactly 6.7 hours per night.
The average sleep for students in the teacher’s first-period class is 6.7 hours per night (based on the survey).
Students at the school would sleep more if they were moved into a first-period class.
Because 32 students were surveyed, the result can be generalized to all students at the school without concern.
The survey proves that most students at the school sleep less than 7 hours per night.
Explanation
This question tests recognition of sampling limitations. The teacher surveyed only students in her first-period class - a convenience sample, not a random sample of all students. Option B correctly limits the conclusion to what the data actually represents: the average for students in that specific class. We cannot generalize to all students (A, D, E) because first-period students might differ systematically from others (perhaps early-risers get less sleep). Option C makes an unsupported causal claim. When data comes from a non-random sample, conclusions must be limited to that specific group.