Linear Regression Models

Help Questions

AP Statistics › Linear Regression Models

Questions 1 - 10
1

A streaming service sampled 20 users and recorded age ($x$, years, from 13 to 62) and average hours streamed per week ($y$). A scatterplot with the least-squares regression line is shown, with fitted equation $\hat{y}=18.0-0.15x$. The purpose of the linear model is to describe the linear association and predict typical weekly streaming time from age within the observed range. Which interpretation of the model is correct?

If a user gets 1 year older, that causes their weekly streaming time to drop by exactly 0.15 hours.

The intercept 18.0 means that 18% of users stream each week.

A 120-year-old is predicted to stream $18.0-0.15(120)$ hours per week, and this prediction is as reliable as those for ages 13–62.

At age 0, the model predicts 18 hours per week, so newborns would typically stream about 18 hours weekly.

For each additional year of age, the predicted weekly streaming time decreases by about 0.15 hours, on average, for users like those sampled.

Explanation

This question tests interpretation of age-related regression models. The equation $\hat{y}=18.0-0.15x$ predicts weekly streaming hours from user age. The correct interpretation (A) states that for each additional year of age, predicted weekly streaming time decreases by about 0.15 hours on average. This properly acknowledges the associative and average nature of the relationship. Choice B incorrectly implies causation from aging itself. Choice C attempts to apply the model to newborns (age 0), far outside the observed range of 13-62 years. Choice D extrapolates to 120 years, well beyond the data range. Choice E completely misinterprets what the intercept represents. Linear models should only be used within the range of observed data - extrapolation to extreme ages produces unreliable and often nonsensical predictions.

2

A botanist measured the amount of fertilizer applied to a plot ($x$, in grams) and the plant height after 6 weeks ($y$, in centimeters) for 9 plots, with fertilizer amounts ranging from 0 to 40 grams. A least-squares regression line is $$\hat{y} = 12.4 + 0.48x$$. The purpose of this linear model is to summarize the linear association and predict typical plant height for fertilizer amounts within the observed range. Which interpretation of the model is correct?

The intercept 12.4 means fertilizer explains 12.4% of plant height.

Applying one more gram of fertilizer causes every plant to grow 0.48 cm taller than it otherwise would.

At 200 grams of fertilizer, the model can be used to predict plant height accurately because regression lines work for any $x$.

For each additional gram of fertilizer, the predicted plant height increases by about 0.48 cm, on average, for plots like those observed.

If 0 grams of fertilizer are applied, the plant height will be exactly 12.4 cm.

Explanation

The skill involves interpreting $\hat{y} = 12.4 + 0.48x$ for plant height from fertilizer (x from 0 to 40 grams). The slope shows 0.48 cm taller per gram on average in the range. Choice A correctly interprets without causation. Distractor B assumes causation for every plant. Choice C treats the intercept as exact for zero fertilizer. Limitation: no extrapolation beyond data, as choice D does to 200 grams. Slope doesn't equal explained variation.

3

An environmental scientist measured water temperature ($x$, in °C, from 6 to 24) and dissolved oxygen ($y$, mg/L) at 12 sites in a river. The least-squares regression line predicting dissolved oxygen from temperature is $\hat{y}=12.1-0.18x$. The purpose of this linear model is to describe the linear association and predict typical dissolved oxygen for temperatures in the observed range. Which interpretation of the model is correct?

For each 1°C increase in water temperature, the predicted dissolved oxygen decreases by about 0.18 mg/L, on average, for sites like those measured.

The intercept 12.1 means 12.1% of the sites had 0 dissolved oxygen.

Raising the temperature by 1°C causes dissolved oxygen to decrease by exactly 0.18 mg/L at every site.

Because the slope is negative, higher temperatures and dissolved oxygen are independent.

At 0°C, dissolved oxygen will be exactly 12.1 mg/L, so the model is accurate for freezing conditions.

Explanation

This question assesses understanding of regression in environmental science. The equation $\hat{y}=12.1-0.18x$ predicts dissolved oxygen from water temperature. The correct answer (A) properly interprets the slope: for each 1°C increase in temperature, predicted dissolved oxygen decreases by about 0.18 mg/L on average. This uses appropriate language for observational data. Choice B incorrectly claims causation and exact effects at every site. Choice C attempts to extrapolate to 0°C, outside the observed range of 6-24°C. Choice D incorrectly claims independence when negative slope indicates negative association. Choice E completely misinterprets the intercept. While temperature likely does causally affect dissolved oxygen, the regression model itself only describes the observed association within the measured temperature range.

4

A school counselor collected data from 12 students on the number of hours they studied for a final exam ($x$, from 1 to 9 hours) and their exam score ($y$, in points). A least-squares regression line was fit to predict score from study hours: $\hat{y}=58.2+4.1x$. The purpose of this linear model is to summarize the linear association and predict typical exam score from study time within the observed range. Which interpretation of the model is correct?

About 58.2% of the variation in exam scores is explained by study hours.

A student who studies 20 hours is predicted to score $58.2+4.1(20)$ points, so the model is accurate for any study time.

If a student studies 0 hours, the model guarantees the student will score exactly 58.2 points.

Because the slope is positive, studying an extra hour causes every student’s score to increase by 4.1 points.

For each additional hour studied, the predicted exam score increases by about 4.1 points, on average, for students similar to those in the data.

Explanation

This question tests understanding of slope interpretation in a linear regression model. The regression equation $\hat{y}=58.2+4.1x$ models the relationship between study hours and exam scores. The correct interpretation (B) states that for each additional hour studied, the predicted exam score increases by about 4.1 points on average for students similar to those in the data. This properly acknowledges that the slope represents an average association, not a guarantee for individuals. Choice C incorrectly implies causation and exact outcomes for every student. Choice A misinterprets the intercept as a guarantee rather than a prediction. Choice D incorrectly extrapolates to 20 hours, which is far beyond the observed range of 1-9 hours. Choice E confuses the intercept with R-squared. Remember that regression models describe average relationships within the observed data range, not causal effects or guarantees for individuals.

5

A real estate agent recorded the size of a house ($x$, in hundreds of square feet) and its selling price ($y$, in thousands of dollars) for 11 homes in a neighborhood. Sizes ranged from 12 to 28 (i.e., 1200 to 2800 sq ft). A least-squares regression line is $$\hat{y}=95+8.7x.$$ The purpose of this linear model is to summarize the linear relationship and predict typical selling prices for houses within the observed size range. Which interpretation of the model is correct?

A house that is 0 square feet would be predicted to sell for $95{,}000, so the model is unrealistic and cannot be used at all.

The intercept 95 means most houses in the neighborhood are about $95{,}000$.

For each additional 100 square feet of size, the predicted selling price increases by about $$8{,}700$$, on average, for houses like those observed.

Increasing a home’s size by 100 square feet causes the selling price to increase by $$8{,}700$$ for every home.

A 3500-square-foot house (x = 35) is predicted to sell for $95+8.7(35)$ thousand dollars, so the model should be used for any home size.

Explanation

Interpreting the regression model \hat{y} = 95 + 8.7x for house prices based on size (x in hundreds of sq ft, from 12 to 28) is the key skill. The slope means each 100 sq ft increase is associated with about $8,700$ higher predicted price on average within observed sizes. Choice B correctly states this without causal language or extrapolation. Choice C is a distractor, wrongly implying causation from size to price. Choice A dismisses the model due to an unrealistic zero-size intercept, but intercepts can be useful even if extrapolated. Limitations: avoid using the model beyond data, as choice D does for 3500 sq ft. Regression captures linear trends but doesn't account for other variables affecting prices.

6

A researcher recorded the distance from a city center ($x$, in miles) and the monthly rent for a one-bedroom apartment ($y$, in dollars) for 13 apartments, with distances ranging from 1 to 18 miles. A least-squares regression line is $$\hat{y}=1850-42x.$$ The purpose of this linear model is to summarize the linear association and predict typical rents for apartments within the observed distance range. Which interpretation of the model is correct?

The intercept 1850 means the average rent of all apartments in the city is $\$1850$.

An apartment 40 miles away is predicted to rent for $1850-42(40)$, so the model is valid far beyond the observed distances.

Each additional mile from the city center is associated with a decrease of about $\$42$ in the predicted monthly rent, on average, for apartments like those observed.

Moving an apartment 1 mile farther from the city center causes the rent to drop by $\$42$.

At 0 miles from the city center, every apartment will rent for exactly $\$1850$.

Explanation

The skill is interpreting \hat{y} = 1850 - 42x for rent versus distance (x from 1 to 18 miles). The slope shows each mile farther associates with $42 lower predicted rent on average in the range. Choice A is correct, avoiding causation and sticking to data. Choice B distracts by claiming direct causation from distance to rent drop. Choice C misinterprets the intercept as exact for zero miles. Limitation: no extrapolation, unlike choice D to 40 miles. Intercepts estimate averages but may not reflect reality outside data.

7

A district analyzed 10 schools, recording average class size ($x$, students per class, from 18 to 34) and average standardized test score ($y$, points). A least-squares regression line was fit: $\hat{y}=610-3.5x$. The purpose of this linear model is to summarize the linear association and predict typical test score from class size within the observed range. Which interpretation of the model is correct?

For each additional student in average class size, the predicted average test score decreases by about 3.5 points, on average, for schools like those studied.

Because the intercept is 610, a school with 0 students per class would score 610 points, and this is a reliable prediction.

Increasing a school’s average class size by 1 student will cause the school’s average test score to drop by 3.5 points.

Since the slope is negative, the correlation must be $r=0$.

The equation shows that smaller classes are the only reason some schools have higher scores.

Explanation

This question examines proper interpretation of regression in educational policy context. The equation $\hat{y}=610-3.5x$ predicts average test scores from average class size. The correct answer (B) properly interprets the slope: for each additional student in average class size, the predicted average test score decreases by about 3.5 points on average. This uses appropriate statistical language avoiding causal claims. Choice A incorrectly implies causation - while smaller classes might cause higher scores, the regression only shows association. Choice C misinterprets the intercept at 0 students per class as meaningful. Choice D wrongly claims class size is the only factor. Choice E incorrectly states that negative slope means zero correlation when it actually indicates negative correlation. Regression models describe associations in observational data but cannot prove causation without proper experimental design.

8

An environmental scientist models ozone level ($y$, in ppb) from traffic volume ($x$, in thousands of cars per day) using data from days with traffic between 10 and 60 (thousand cars). The regression line is $\hat{y}=18+1.1x$. The purpose of the linear model is to describe the association and predict typical ozone levels for traffic volumes in the observed range. Which interpretation of the model is correct?

Because the slope is positive, increasing traffic causes ozone to increase by 1.1 ppb for every additional 1,000 cars.

If traffic volume is 0, then the ozone level will be 18 ppb.

For each additional 1,000 cars per day (within 10–60 thousand), the predicted ozone level increases by about 1.1 ppb, on average.

An increase of 1.1 ppb in ozone causes traffic volume to increase by 1,000 cars per day.

At 100 thousand cars per day, the model predicts 128 ppb, so it is appropriate to use the model at 100 thousand cars per day.

Explanation

This question tests understanding of slope interpretation in an environmental science context. The regression equation $\hat{y}=18+1.1x$ models predicted ozone levels from traffic volume (in thousands of cars), where the slope 1.1 represents the average change in predicted ozone per thousand cars. Choice A correctly states "for each additional 1,000 cars per day, the predicted ozone level increases by about 1.1 ppb, on average." Choice B incorrectly treats the intercept as an actual value rather than a prediction outside the data range. Choice C reverses causation, suggesting ozone causes traffic changes. Choice D claims direct causation from an observational study. Choice E extrapolates to 100 thousand cars, well beyond the observed range of 10-60 thousand. Regression models from observational data describe associations, not causal relationships, and should not be extrapolated beyond their data range.

9

A researcher studied 13 cars and recorded vehicle weight ($x$, in thousands of pounds, from 2.4 to 4.8) and highway fuel economy ($y$, miles per gallon). The least-squares regression line predicting mpg from weight is $\hat{y}=46.0-5.2x$. The purpose of this linear model is to describe the linear association and predict typical fuel economy for weights in the observed range. Which interpretation of the model is correct?

The intercept 46.0 means that 46% of cars get 0 mpg when weight is 0.

Reducing a car’s weight by 1,000 pounds will cause its highway mpg to increase by exactly 5.2 for every car.

A car that weighs 0 pounds would be predicted to get 46.0 mpg, and that prediction is meaningful because it comes from the model.

For each additional 1,000 pounds of vehicle weight, the predicted highway fuel economy decreases by about 5.2 mpg, on average, for cars like those in the study.

Because the slope is negative, there is no relationship between weight and mpg.

Explanation

This question tests understanding of regression interpretation in an automotive context. The equation $\hat{y}=46.0-5.2x$ predicts highway fuel economy from vehicle weight (in thousands of pounds). The correct interpretation (A) states that for each additional 1,000 pounds of weight, predicted highway mpg decreases by about 5.2 on average. This properly uses associative language and acknowledges the average nature of the relationship. Choice B incorrectly implies causation and exact effects. Choice C attempts to interpret the intercept at 0 weight, which is meaningless and far outside the observed range of 2.4-4.8 thousand pounds. Choice D incorrectly claims no relationship when negative slope indicates negative association. Choice E completely misinterprets the intercept. Remember that regression models describe patterns within realistic data ranges, not impossible scenarios like weightless cars.

10

A manager tracked the number of customers served in an hour ($x$) and the total tips earned that hour ($y$, in dollars) for 18 hourly shifts, with $x$ ranging from 12 to 55 customers. A least-squares regression line is $$\hat{y}=8.5+0.62x$$. The purpose of this linear model is to summarize the linear association and predict typical tips for shifts within the observed range. Which interpretation of the model is correct?

The intercept 8.5 means most hours have about 8.5 customers.

Serving one more customer causes tips to increase by exactly \$0.62 every hour.

If 0 customers are served, the server will earn exactly \$8.50 in tips.

Each additional customer served is associated with an increase of about \$0.62 in the predicted total tips for that hour, on average, for shifts like those observed.

A shift with 120 customers can be predicted accurately using the model since the relationship is linear.

Explanation

This question assesses interpreting \hat{y} = 8.5 + 0.62x for tips from customers served (x from 12 to 55). The slope indicates $0.62 more predicted tips per extra customer on average in the range. Choice A is right, limiting to association and data. Choice B wrongly claims causation for exact increases. Choice C misuses the intercept for zero customers. Key limitation: avoid extrapolation, unlike choice D to 120 customers. Intercepts may not be meaningful alone.

Page 1 of 4