Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

Opening subject page...

Loading your content

Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

AP STATISTICS • COLLECTING DATA

Inference and Experiments

Why only well-designed experiments let us draw cause-and-effect conclusions from data.

SECTION 1

Historical Context & Motivation

For centuries, scientists and physicians relied on anecdotal evidence and uncontrolled observations to evaluate treatments, policies, and interventions. The results were often misleading: a new tonic might appear effective simply because patients who chose to take it were healthier to begin with. The fundamental problem—confounding variables—made it impossible to separate the effect of the treatment from pre-existing differences between groups. The development of formal experimental design over the twentieth century provided a rigorous framework for making causal inferences rather than merely detecting associations.

1747

Lind's Scurvy Trial

James Lind divided twelve sailors into six pairs and assigned each pair a different remedy, conducting one of the earliest controlled experiments in medicine.

1925

Fisher's Principles of Experimental Design

Ronald A. Fisher published foundational work at Rothamsted Experimental Station introducing randomization, replication, and blocking as pillars of sound experimentation.

1948

First Randomized Clinical Trial

The British Medical Research Council tested streptomycin for tuberculosis using random assignment, establishing the randomized controlled trial (RCT) as the gold standard in medical research.

1965

Causal Inference Formalized

Donald Rubin extended the potential-outcomes framework, providing a formal statistical model for defining causal effects and clarifying when inference about causation is valid.

The central question this lesson addresses is: Under what conditions can we move beyond association and legitimately claim that one variable causes a change in another? The answer lies in understanding how experiments are designed, why random assignment is essential, and what scope of inference a given study design supports.

SECTION 2

Core Principles of Experiments and Inference

A well-designed experiment is the only data-collection method that supports a cause-and-effect conclusion. Observational studies, no matter how large, can only establish associations because uncontrolled lurking variables may explain observed differences. The following principles distinguish experiments from other study designs and dictate the kind of inference each supports.

Random Assignment

Subjects are assigned to treatment groups by a chance mechanism (e.g., coin flip, random number generator). This balances both known and unknown confounders across groups, enabling causal inference.

Control Group

A comparison group that receives no treatment or a placebo provides a baseline. Without a control, any change could reflect natural progression rather than the treatment.

Replication

Each treatment is applied to a sufficiently large number of experimental units so that natural variability can be estimated and effects can be distinguished from noise.

Blinding

In a double-blind experiment, neither the subjects nor the evaluators know group assignments, minimizing the placebo effect and measurement bias.

✦ KEY TAKEAWAY

Think of random assignment as shuffling a deck of cards before dealing: every hand is equally likely, so no player can claim the deal was rigged. In the same way, randomization ensures that treatment groups are comparable before the experiment begins, so any differences observed afterward can be attributed to the treatment itself rather than to pre-existing group differences.

SECTION 3

Visual Explanation — Experiment vs. Observational Study

The left branch shows the experimental pathway: the researcher uses random assignment to create comparable groups, enabling a causal conclusion. The right branch shows an observational study where self-selection introduces potential confounders, limiting conclusions to association.

The diagram above captures the essential logic of inference in statistics. When the researcher actively imposes treatments through random assignment, any systematic difference between group outcomes can be attributed to the treatment because randomization balances all other variables—both measured and unmeasured—across groups. In contrast, when subjects choose their own groups (or are sorted by nature, economics, or personal preference), a lurking variable may drive both group membership and the response, producing a spurious association. This distinction is the single most important idea in the AP Statistics curriculum on data collection.

SECTION 4

How Random Assignment Supports Causal Inference

The statistical engine behind experimental inference relies on the concept of sampling variability under the null hypothesis. After random assignment and data collection, statisticians ask: "If the treatment had no effect, how likely is a difference at least as extreme as the one observed?" When that probability (the p-value) is sufficiently small, we reject the null hypothesis and conclude that the treatment caused the observed effect.

Logic of a Randomization Test

Consider an experiment with two groups. After measuring the response variable, compute the observed difference in means (or proportions). Under the null hypothesis that treatment has no effect, every subject would have produced the same response regardless of group assignment. We can simulate thousands of re-randomizations to build a distribution of differences that would occur by chance alone.

OBSERVED STATISTIC

d_obs = x̄_treatment − x̄_control

where x̄_treatment and x̄_control are the sample means of the treatment and control groups, respectively.

P-VALUE (TWO-SIDED)

The proportion of all possible random reassignments producing a difference at least as extreme as d_obs. A small p-value (typically < 0.05) provides evidence against the null hypothesis.

⚠ Why Randomization Matters Statistically

Without random assignment, the p-value computed from a hypothesis test does not legitimately measure the probability of the observed difference arising from chance alone, because pre-existing group differences (confounders) could produce the same outcome. Random assignment is not merely a "nice to have"—it is the logical prerequisite for interpreting a significance test as evidence of causation.

SECTION 5

Scope of Inference — The Two-by-Two Framework

Every statistical study supports a certain scope of inference determined by two design features: whether the study used random selection (to obtain the sample from a larger population) and whether it used random assignment (to allocate subjects to treatments). These two forms of randomness serve different purposes: random selection supports generalizing results to the population, while random assignment supports causal conclusions.

The four cells of the scope-of-inference table. Random selection controls generalizability; random assignment controls causality. Most AP exam questions place a study into one of these four quadrants.

On the AP exam, the most common scenario involves a study that uses random assignment but not random selection—placing it in the upper-right cell. In this case, you can conclude that the treatment caused the observed difference, but only for the subjects in the study, not for any broader population. Conversely, a well-designed survey (random selection, no random assignment) supports generalization but not causation. The ideal—both forms of randomness—is rare outside large-scale clinical trials.

SECTION 6

Worked Example — Identifying Scope of Inference

A researcher wants to study whether background music improves test performance. She recruits 60 volunteers from her university and randomly assigns 30 to take a math test with classical music playing and 30 to take the same test in silence. The music group scores an average of 4.2 points higher. A significance test yields p = 0.014.

What conclusion can the researcher draw?

Step 1 — Identify random assignment

The researcher randomly assigned subjects to the two conditions (music vs. silence). This is a true experiment, so a causal conclusion about the effect of music is possible if results are statistically significant.

Step 2 — Identify random selection

The subjects are volunteers from one university—not a random sample from any defined population. This means results cannot be generalized beyond the 60 participants.

Step 3 — Evaluate significance

The p-value of 0.014 is less than the conventional α = 0.05. This means the observed difference of 4.2 points is unlikely to have occurred by chance alone if the music truly had no effect.

p = 0.014 < 0.05 → statistically significant

Step 4 — State the conclusion in context

Because the study used random assignment and the result is significant, we can conclude that listening to classical music caused the higher test scores for these 60 subjects. However, because the sample was not randomly selected, we cannot generalize this conclusion to all university students or the broader population.

Scope: Causal conclusion for subjects in the study only (upper-right cell).

SECTION 7

Experiments vs. Observational Studies — Strengths & Limitations

Comparison of experiments and observational studies across key features

Feature	Experiment	Observational Study
Treatment assignment	Researcher imposes via random assignment	Subjects self-select or are observed as-is
Confounding control	Randomization balances known and unknown confounders	Can only control for measured confounders (statistically)
Causal inference	Yes — supported	No — only association
Ethical feasibility	May be unethical (e.g., assigning smoking)	Can study exposures that cannot be imposed
Generalizability	Often limited (convenience samples)	Often broader (large random samples available)
Cost & complexity	Typically expensive and time-consuming	Often cheaper; can use existing data

✦ KEY TAKEAWAY

Observational studies are not inferior to experiments—they answer different questions. When ethical or practical constraints prevent random assignment (e.g., studying the effects of poverty or radiation exposure), observational studies become the primary tool. The key is to match your conclusion to the study design: never claim causation from an observational study, and always acknowledge generalizability limits when random selection is absent.

SECTION 8

Connection to Inference Procedures in AP Statistics

The concepts in this lesson lay the groundwork for every inference procedure you will encounter later in the course. When you perform a two-sample t-test, a chi-square test, or a confidence interval, the validity of your conclusion depends on how the data were collected. A statistically significant result from a randomized experiment warrants a causal interpretation; the same p-value from an observational study does not.

How experiment design concepts recur throughout the AP Statistics curriculum

Concept in This Lesson	Where It Reappears Later
Random assignment → causal claim	Conclusion step of every significance test ("there is evidence that X causes Y" vs. "there is an association")
Random selection → generalizability	Identifying the population to which confidence intervals or test results apply
Confounding variables	Explaining why an observed association may not reflect a causal relationship in regression (lurking variables)
P-value under H₀	All hypothesis tests—z-tests, t-tests, chi-square tests—use the same null-hypothesis logic introduced here

In more advanced coursework, you would encounter the Rubin Causal Model and techniques like propensity-score matching that attempt to approximate the benefits of randomization in observational data. At the AP level, however, the essential takeaway is simpler: design determines conclusion. Master the scope-of-inference framework, and you will navigate the conclusion step of any FRQ with confidence.

SECTION 9

Practice Problems

PROBLEM 1 — CONCEPTUAL

A researcher randomly assigns 40 patients to receive either a new drug or a placebo and finds a statistically significant reduction in blood pressure in the drug group. The patients were volunteers from a single hospital. Which of the following is the most appropriate conclusion?

PROBLEM 2 — BASIC CALCULATION

A school district randomly selects 200 students from all high schools in the district and surveys them about sleep habits and GPA. Students who sleep more than 8 hours have a higher average GPA. Which statement is true?

PROBLEM 3 — INTERMEDIATE

Researchers randomly select 500 adults from a city and then randomly assign half to follow a new exercise regimen while the other half maintains their current habits. After 12 weeks, the exercise group shows significantly lower cholesterol. What scope of inference does this study support?

PROBLEM 4 — APPLIED

A technology company wants to determine whether a new website layout increases the time users spend on their site. They recruit 100 employees from their company and randomly assign 50 to use the new layout and 50 to use the current layout for one week. The mean daily time on site for the new-layout group is 18.3 minutes, compared with 14.7 minutes for the current-layout group, with p = 0.003. (a) Identify the explanatory and response variables. (b) Is this an experiment or an observational study? Justify your answer. (c) What is the appropriate scope of inference? Explain both the causal and generalizability dimensions. (d) Name one potential confounding variable that random assignment helps control, and explain how.

PROBLEM 5 — CRITICAL THINKING

A public health researcher wants to determine whether daily consumption of green tea reduces the risk of heart disease. She argues that conducting a randomized experiment over 20 years is impractical, so she proposes an observational study instead. (a) Explain why a randomized experiment would be the preferred design for establishing a causal link between green tea consumption and heart disease. (b) Identify two specific confounding variables that could bias the results of the observational study, and explain how each could create a misleading association. (c) The researcher collects data and finds that green-tea drinkers have a 15% lower rate of heart disease (p = 0.01). Write an appropriate conclusion that correctly reflects the scope of inference. (d) Suggest one design modification the researcher could implement to strengthen causal inference without a full randomized experiment.

SUMMARY

Lesson Summary

The ability to draw valid inferences from data depends entirely on study design. Only experiments with random assignment support cause-and-effect conclusions, because randomization balances both known and unknown confounding variables across groups. Observational studies can detect associations but cannot eliminate confounders, so causal language is never appropriate for them.

The scope of inference is determined by two dimensions: random selection (enables generalization to the population) and random assignment (enables causal claims). A well-designed experiment with control, replication, and blinding maximizes the strength of evidence. Always match your conclusion to the design: state causation only when random assignment justifies it, and limit your generalization to the population from which the sample was drawn.

Opening subject page...

Loading your content

AP STATISTICS • COLLECTING DATA

Inference and Experiments

Why only well-designed experiments let us draw cause-and-effect conclusions from data.

SECTION 1

Historical Context & Motivation

1747

Lind's Scurvy Trial

James Lind divided twelve sailors into six pairs and assigned each pair a different remedy, conducting one of the earliest controlled experiments in medicine.

1925

Fisher's Principles of Experimental Design

Ronald A. Fisher published foundational work at Rothamsted Experimental Station introducing randomization, replication, and blocking as pillars of sound experimentation.

1948

First Randomized Clinical Trial

The British Medical Research Council tested streptomycin for tuberculosis using random assignment, establishing the randomized controlled trial (RCT) as the gold standard in medical research.

1965

Causal Inference Formalized

Donald Rubin extended the potential-outcomes framework, providing a formal statistical model for defining causal effects and clarifying when inference about causation is valid.

SECTION 2

Core Principles of Experiments and Inference

Random Assignment

Control Group

A comparison group that receives no treatment or a placebo provides a baseline. Without a control, any change could reflect natural progression rather than the treatment.

Replication

Each treatment is applied to a sufficiently large number of experimental units so that natural variability can be estimated and effects can be distinguished from noise.

Blinding

In a double-blind experiment, neither the subjects nor the evaluators know group assignments, minimizing the placebo effect and measurement bias.

✦ KEY TAKEAWAY

SECTION 3

Visual Explanation — Experiment vs. Observational Study

SECTION 4

How Random Assignment Supports Causal Inference

Logic of a Randomization Test

OBSERVED STATISTIC

d_obs = x̄_treatment − x̄_control

where x̄_treatment and x̄_control are the sample means of the treatment and control groups, respectively.

P-VALUE (TWO-SIDED)

The proportion of all possible random reassignments producing a difference at least as extreme as d_obs. A small p-value (typically < 0.05) provides evidence against the null hypothesis.

⚠ Why Randomization Matters Statistically

SECTION 5

Scope of Inference — The Two-by-Two Framework

SECTION 6

Worked Example — Identifying Scope of Inference

What conclusion can the researcher draw?

Step 1 — Identify random assignment

Step 2 — Identify random selection

The subjects are volunteers from one university—not a random sample from any defined population. This means results cannot be generalized beyond the 60 participants.

Step 3 — Evaluate significance

The p-value of 0.014 is less than the conventional α = 0.05. This means the observed difference of 4.2 points is unlikely to have occurred by chance alone if the music truly had no effect.

p = 0.014 < 0.05 → statistically significant

Step 4 — State the conclusion in context

Scope: Causal conclusion for subjects in the study only (upper-right cell).

SECTION 7

Experiments vs. Observational Studies — Strengths & Limitations

Comparison of experiments and observational studies across key features

Feature	Experiment	Observational Study
Treatment assignment	Researcher imposes via random assignment	Subjects self-select or are observed as-is
Confounding control	Randomization balances known and unknown confounders	Can only control for measured confounders (statistically)
Causal inference	Yes — supported	No — only association
Ethical feasibility	May be unethical (e.g., assigning smoking)	Can study exposures that cannot be imposed
Generalizability	Often limited (convenience samples)	Often broader (large random samples available)
Cost & complexity	Typically expensive and time-consuming	Often cheaper; can use existing data

✦ KEY TAKEAWAY

SECTION 8

Connection to Inference Procedures in AP Statistics

How experiment design concepts recur throughout the AP Statistics curriculum

Concept in This Lesson	Where It Reappears Later
Random assignment → causal claim	Conclusion step of every significance test ("there is evidence that X causes Y" vs. "there is an association")
Random selection → generalizability	Identifying the population to which confidence intervals or test results apply
Confounding variables	Explaining why an observed association may not reflect a causal relationship in regression (lurking variables)
P-value under H₀	All hypothesis tests—z-tests, t-tests, chi-square tests—use the same null-hypothesis logic introduced here

SECTION 9

Practice Problems

PROBLEM 1 — CONCEPTUAL

PROBLEM 2 — BASIC CALCULATION

PROBLEM 3 — INTERMEDIATE

PROBLEM 4 — APPLIED

PROBLEM 5 — CRITICAL THINKING

SUMMARY