Introducing Statistics: Do Those Points Align

Help Questions

AP Statistics › Introducing Statistics: Do Those Points Align

Questions 1 - 10
1

After collecting data on the number of years a person has been running and their time to complete a 10k race, a coach finds a negative linear association in her sample. Which question is most directly related to the statistical concept of inference for regression slope?

How many runners were included in the sample to ensure the results are valid?

What is the fastest 10k time recorded in the sample data?

How strong is the evidence from this sample that a negative linear relationship exists for all runners of this type?

Is the relationship between years running and 10k time stronger than the relationship between age and 10k time?

Explanation

Inference for the slope is about assessing the strength of the evidence provided by the sample to make a conclusion about the population. This question directly asks about the strength of evidence for a linear relationship at the population level.

2

Imagine a population where for any given value of an explanatory variable $$x$$, there is a distribution of corresponding values for a response variable $$y$$. If a linear relationship exists, the means of these distributions fall on a straight line. Why would individual points from this population not fall perfectly on that line?

Because any sample drawn from the population will have sampling error.

Because of inherent random variation in the response variable $$y$$ that is unrelated to the explanatory variable $$x$$.

Because the slope of the population regression line is not equal to zero.

Because the relationship between $$x$$ and $$y$$ is actually curved, not linear.

Explanation

This describes the fundamental model for linear regression. The line captures the mean response, but individual responses vary randomly around that mean. This variation is the error term ($$\epsilon$$) in the model $$y = \alpha + \beta x + \epsilon$$.

3

A consumer advocacy group studied the relationship between the sugar content (in grams) and the consumer rating (on a scale of 1 to 100) for a random sample of breakfast cereals. They found a sample slope of -1.5. A primary question for statistical inference would be to determine if this negative slope is:

a result of a non-linear relationship that should be corrected by transforming the data before analysis.

due to sampling variability, or if it provides evidence of a true negative linear relationship in the population of all cereals.

exactly equal to the population slope, which can be confirmed by increasing the sample size.

the correct slope for this sample, since the least-squares method provides a unique line for the data.

Explanation

Statistical inference aims to distinguish between a result that could have happened by chance (sampling variability) and a result that is statistically significant, suggesting a real effect or relationship in the larger population.

4

An economist models the relationship between a country's GDP and its average life expectancy using a linear regression model on data from a sample of countries. After fitting the line, she observes that the residuals are all positive for low and high GDPs and negative for mid-range GDPs. What does this pattern suggest about the linear model?

The relationship is linear, but the sample size was too small to accurately estimate the true slope.

The association between GDP and life expectancy is negative, which contradicts the model's assumptions.

A linear model may not be appropriate because the variation of points around the line is non-random.

The variation of points around the regression line is purely random, which is expected in a good model.

Explanation

A distinct pattern in the residual plot, such as a curve, indicates that the relationship between the variables is likely non-linear. The variation of points around the line is therefore not random, and a simple linear model is probably not the best fit for the data.

5

A trainer at a gym wants to investigate if there is a positive linear relationship between the number of hours a person spends at the gym per week and the amount of weight they can lift. Data is collected from a random sample of gym members. From an inferential standpoint, what is the fundamental question the trainer is trying to answer?

Does the observed positive slope from the sample provide convincing evidence that the true slope for all gym members is also positive?

Is the slope calculated from the sample data greater than the intercept calculated from the sample data?

Can the number of hours spent at the gym be used to perfectly predict the amount of weight a person can lift for this sample?

What is the specific increase in lifting ability for each additional hour spent at the gym for the members in the sample?

Explanation

This question properly frames the inferential task: using the evidence from a sample (the observed positive slope) to make a claim about the larger population (whether the true slope is positive).

6

A researcher calculates a sample regression line for the relationship between caffeine intake and hours of sleep for a random sample of students. She knows that if she took another random sample, she would likely get a different sample slope. How does this concept of sampling variability in the slope affect her conclusions about the true relationship?

It proves that no true linear relationship exists between caffeine intake and hours of sleep in the population.

It suggests that the relationship is non-linear and a different model is required to capture the true pattern.

It means her first sample is likely biased, and she should take many more samples to find the correct slope.

It introduces uncertainty, which is why she would use a confidence interval to estimate the true slope.

Explanation

The fact that sample slopes vary from sample to sample (sampling variability) means there is uncertainty in any single sample's slope as an estimate of the population slope. A confidence interval is a statistical tool designed to account for this uncertainty by providing a range of plausible values for the true population slope.

7

A biologist believes there is a true linear relationship between the height of a certain species of plant and the amount of a specific nutrient in the soil. She collects data from a random sample of 50 plants. What does the least-squares regression line calculated from this sample represent?

The only possible linear model that could be used to describe the data for this specific sample.

The exact linear relationship for all plants of this species.

Proof that a linear relationship exists between the two variables for all plants of this species.

An estimate of the true linear relationship for all plants of this species.

Explanation

A least-squares regression line created from sample data is a statistic that serves as an estimate of the unknown population regression line. It represents our best guess for the true linear relationship based on the available sample data.

8

A researcher investigates the relationship between hours of weekly exercise and resting heart rate for adults at a large company. They take a random sample of 30 adults and calculate the slope of the least-squares regression line. If they were to take a second, independent random sample of 30 adults from the same company, which of the following is most likely to be true?

The intercept of the regression line for the second sample would be the same as the first, but the slope would be different.

The slope of the regression line for the second sample would be exactly equal to the true population slope.

The slope of the regression line for the second sample would be exactly the same as the first sample's slope because the underlying population is the same.

The slope of the regression line for the second sample would be different from the first sample's slope due to sampling variability.

Explanation

Due to sampling variability, different random samples from the same population will almost certainly produce different values for sample statistics, including the slope and intercept of the least-squares regression line.

9

In the context of simple linear regression, what does the population regression line, $$\mu_y = \alpha + \beta x$$, describe?

The exact value of the response variable $$y$$ for every individual in the population for a given value of the explanatory variable $$x$$.

The predicted value of the explanatory variable $$x$$ based on a specific value of the response variable $$y$$.

The relationship between the explanatory and response variables for a particular sample drawn from the population.

The mean value of the response variable $$y$$ for all individuals in the population that have a specific value of the explanatory variable $$x$$.

Explanation

The population regression line describes the average response ($$\mu_y$$) for a given explanatory value ($$x$$). Individual responses will vary around this mean value.

10

A sociologist repeatedly takes random samples of size 40 from a large population of workers to study the relationship between years of education and annual income. For each sample, the slope of the least-squares regression line is calculated. Which of the following best describes the resulting collection of all the calculated sample slopes?

A sampling distribution of the slope.

A population distribution of income.

A scatterplot of income versus education.

A distribution of residuals.

Explanation

A sampling distribution is the distribution of a statistic (in this case, the slope) from all possible samples of a given size. The collection of slopes from repeated samples forms the sampling distribution of the slope, which shows how the sample slope varies.

Page 1 of 2