Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

Opening subject page...

Loading your content

  1. AP Statistics
  2. Introducing Statistics: Do the Data We Collected Tell the Truth?

AP STATISTICS • COLLECTING DATA

Introducing Statistics: Do the Data We Collected Tell the Truth?

Understanding how data collection methods determine whether our conclusions are trustworthy and generalizable.

SECTION 1

Historical Context & Motivation

The question of whether collected data faithfully represent reality is as old as systematic inquiry itself. Long before formal statistical theory existed, governments conducted censuses and merchants kept ledgers, yet the reliability of such records was perpetually in doubt. The modern discipline of statistics arose precisely because scholars recognized that raw numbers can mislead as easily as they can illuminate—what matters is how data are collected, not merely that they exist.

1662
John Graunt's Bills of Mortality
Graunt pioneered the systematic analysis of London death records, revealing patterns in mortality and exposing how incomplete or biased records could distort conclusions about public health.
1936
The Literary Digest Debacle
The Literary Digest polled 2.4 million respondents yet predicted Landon over Roosevelt because its sampling frame—telephone owners and magazine subscribers—systematically excluded lower-income voters, demonstrating that large samples do not compensate for biased collection.
1948
Dewey Defeats Truman—Except He Didn't
Quota sampling and early termination of polling led major outlets to call the 1948 election incorrectly, spurring the adoption of probability-based sampling methods across survey research.
1965
Fisher, Neyman, and Experimental Design
By the mid-twentieth century, R. A. Fisher's principles of randomization, replication, and blocking became foundational, establishing that carefully designed experiments could yield data capable of supporting causal claims.
2020s
Big Data and Algorithmic Bias
Massive datasets collected through digital platforms raised fresh questions about representativeness: data that are plentiful can still be profoundly unrepresentative if the collection mechanism excludes or overrepresents certain groups.

The central thread running through every episode above is a single, persistent question: Do the data we collected actually tell the truth about the population or process we care about? Answering that question requires understanding how bias infiltrates data collection, how variability arises in samples, and what design principles guard against misleading conclusions. This lesson introduces the conceptual framework that the entire AP Statistics course builds upon.

SECTION 2

Core Principles & Definitions

Before we can evaluate whether data "tell the truth," we need a shared vocabulary. The discipline of statistics distinguishes sharply between the group we want to learn about and the subset we actually observe, between values that summarize samples and values that describe populations, and between studies that can establish causation and those that can only suggest association. These distinctions are not pedantic—they determine the scope of every conclusion we draw.

1

Population vs. Sample

A population is the entire group of individuals or observations about which we want information. A sample is the subset we actually examine. The validity of our conclusions depends on how faithfully the sample mirrors the population.
2

Parameter vs. Statistic

A parameter is a fixed numerical summary of the population (e.g., the true mean μ). A statistic is the corresponding summary computed from a sample (e.g., the sample mean x̄). Statistics vary from sample to sample; parameters do not.
3

Bias vs. Variability

Bias is the systematic tendency of a method to over- or under-estimate the parameter. Variability describes how much statistics differ across repeated samples. Good data collection minimizes both bias and variability.
4

Observational Study vs. Experiment

In an observational study, researchers observe without intervening. In an experiment, researchers deliberately impose treatments on subjects. Only experiments with proper randomization can support causal conclusions.
5

Confounding Variables

A confounding variable is associated with both the explanatory and the response variable, making it impossible to determine which factor truly causes the observed effect. Random assignment is the primary tool for neutralizing confounders.
✦ KEY TAKEAWAY
Think of a population parameter as the "true signal" you are trying to detect, like the actual temperature outside. Your sample statistic is the thermometer reading. Bias is a miscalibrated thermometer that always reads 3°F too high—no number of readings will fix this. Variability is a sensitive thermometer that fluctuates from reading to reading. Proper data collection methods are like calibrating and stabilizing your instrument so the reading is both accurate and precise.
SECTION 3

Visualizing Bias and Variability

The classic "bullseye" diagram is the most intuitive way to understand the interplay between bias and variability. Imagine that each dot on the target represents a sample statistic obtained from a separate random sample drawn by the same method. The center of the target represents the true population parameter. The pattern of dots reveals whether our data collection method systematically misses the mark (bias), scatters widely (high variability), or both.

Bias × Variability: The Bullseye FrameworkLow BiasLow Variability✓ IDEALLow BiasHigh VariabilityHigh BiasLow Variability● Center of target = true parameter · ● Dots = sample statisticsCluster position → bias · Cluster spread → variability
Each panel shows seven sample statistics (dots) relative to the true parameter (bullseye center). The ideal scenario—low bias and low variability—clusters tightly around the center. High variability scatters the dots, while high bias shifts the entire cluster away from the target.

Notice the crucial asymmetry between the two sources of error. Variability can be reduced by increasing the sample size—more data points tighten the cluster. Bias, however, does not diminish with larger samples; a flawed collection method will produce systematically skewed results no matter how many observations you gather. This is why the Literary Digest poll of 2.4 million respondents was less accurate than George Gallup's poll of only 50,000: the Digest's method was biased, and no amount of data could fix it. The lesson for AP Statistics is that the design of data collection is the first and most consequential decision in any study.

SECTION 4

How Data Collection Methods Determine Truth

The truthfulness of data is not an inherent property of the numbers themselves—it emerges from the process that generated them. In this section we formalize the key mechanisms through which collection methods shape what the data can and cannot tell us. Although the AP Statistics course does not require calculus-based derivations, understanding these foundational relationships equips you to evaluate any study you encounter.

Sampling Bias: When the Sample Does Not Represent the Population

When every member of the population does not have a known, nonzero chance of being selected, the resulting sample is prone to selection bias. The formal remedy is probability sampling, in which each unit's probability of inclusion is specified in advance. The simplest form is a simple random sample (SRS), where every subset of n units from the population of N is equally likely to be the chosen sample.

NUMBER OF POSSIBLE SIMPLE RANDOM SAMPLES
C(N, n) = N! / (n! × (N − n)!)
N = population size, n = sample size. In a true SRS, each of these C(N, n) subsets has probability 1/C(N, n) of being selected.

Sampling Variability and the Role of Sample Size

Even an unbiased sampling method produces statistics that vary from one sample to the next. The sampling distribution of a statistic describes the pattern of this variation across all possible samples. For the sample mean, the standard deviation of its sampling distribution—called the standard error—is governed by the population standard deviation σ and the sample size n.

STANDARD ERROR OF THE SAMPLE MEAN
SE(x̄) = σ / √n
σ = population standard deviation, n = sample size. Quadrupling the sample size halves the standard error—a fundamental result you will see repeatedly in inference.

Confounding and Causal Claims

When we wish to claim that one variable causes changes in another, observational data alone are insufficient because confounding variables may lurk behind any observed association. A well-designed randomized controlled experiment addresses this by randomly assigning subjects to treatment groups, which tends to balance both known and unknown confounders across groups. Random assignment does not eliminate variability—it ensures that the remaining variation is attributable to chance rather than to systematic differences between groups.

SCOPE OF INFERENCE RULE
Random selection → Generalize to population Random assignment → Cause-and-effect conclusion
A study may have both, one, or neither form of randomization. The presence or absence of each determines which conclusions are legitimate.
SECTION 5

Classifying Sources of Bias in Data Collection

Understanding bias requires more than knowing it exists—you must be able to identify its specific form, because different biases demand different remedies. On the AP Statistics exam, free-response questions frequently ask you to name the type of bias, explain the direction of its effect, and describe how the study design could be improved. The following taxonomy covers the major categories you will encounter.

Taxonomy of Bias in Data CollectionSOURCES OF BIASSELECTION BIASMEASUREMENT BIASNONRESPONSE BIASUndercoverageSome groups in thepopulation are left outof the sampling frame.Convenience SamplingSelecting whicheverindividuals are easiestto reach.Response BiasWording, interviewer,or social desirabilityeffects skew answers.Instrument ErrorFaulty scales, tests,or coding schemesproduce distorted data.NonresponseSelected individualswho do not participatemay differ systematically.REMEDIESSelection bias → Use probability sampling (SRS, stratified, cluster)Measurement bias → Pilot-test instruments, use neutral wording, blind observersNonresponse bias → Follow-up contacts, incentives, keep surveys shortNo amount of data can fix bias — only better design can.
Flowchart classifying the three major families of bias in data collection—selection bias, measurement bias, and nonresponse bias—along with their subtypes and corresponding remedies.

Each branch in the diagram above corresponds to a distinct phase of data collection. Selection bias occurs before data are gathered, when the mechanism for choosing subjects systematically favors certain groups. Measurement bias occurs during data gathering, when the questions asked, the instruments used, or the observers' expectations distort the recorded values. Nonresponse bias occurs after subjects are selected but before data are recorded, when a substantial fraction of those chosen decline or fail to participate—and those nonrespondents differ meaningfully from those who do respond.

📝 AP EXAM TIP
When a free-response question asks you to identify a potential source of bias, you must do three things: (1) name the type of bias, (2) explain the mechanism by which it operates in the given scenario, and (3) describe the direction of the bias—does it tend to overestimate or underestimate the quantity of interest?
SECTION 6

Worked Example: Evaluating a Study Design

Consider the following scenario, which is typical of AP Statistics free-response questions. A university researcher wants to determine whether a new tutoring program improves exam scores. She posts flyers in the library inviting students to sign up for the program. Of the 120 students who volunteer, she randomly assigns 60 to receive tutoring and 60 to a control group. After eight weeks, the tutored group scores an average of 78 on the final exam compared to 72 for the control group. She concludes that the tutoring program is effective and should be adopted university-wide.

Analyzing the Validity of the Researcher's Conclusion

Step 1 — Identify the Type of Study

The researcher randomly assigned volunteers to treatment and control groups, so this is a randomized experiment. Because random assignment was used, confounding variables are likely balanced between the two groups, which supports a causal interpretation: the tutoring program likely caused the difference in scores.
Study type: Randomized experiment → causal claim is potentially justified.

Step 2 — Assess Random Selection

The participants were not randomly selected from the university population; they volunteered in response to library flyers. This constitutes a convenience/voluntary response sample. Students who frequent the library and proactively seek academic help may be more motivated than the average university student, introducing selection bias.
No random selection → cannot generalize to all university students.

Step 3 — Apply the Scope of Inference Framework

With random assignment but without random selection, the study falls into the category where we can claim causation (the tutoring caused higher scores among these participants), but we cannot generalize to the broader university population. The researcher's conclusion that the program "should be adopted university-wide" oversteps the scope of inference.
Valid claim: tutoring caused improvement for these volunteers. Invalid claim: the effect generalizes university-wide.

Step 4 — Suggest an Improved Design

To support a generalizable causal conclusion, the researcher could randomly select 120 students from the university enrollment roster (random selection) and then randomly assign them to tutoring and control conditions (random assignment). This two-stage randomization would allow both a causal interpretation and generalization to the full student body.
Improved design: Random selection + random assignment = causal + generalizable.
SECTION 7

Strengths and Limitations of Data Collection Methods

No single data collection method is universally superior; each involves trade-offs among cost, practicality, ethical constraints, and the strength of conclusions it can support. The table below compares the major methods you will encounter in the AP Statistics curriculum, highlighting when each is most appropriate and where each can go wrong.

Comparison of major data collection methods in AP Statistics
MethodStrengthsLimitations
CensusNo sampling variability; complete information about every individual in the population.Often impractical or impossible for large populations; expensive, time-consuming, and still subject to measurement bias.
Simple Random SampleUnbiased estimates; straightforward probability calculations; basis for statistical inference.Requires a complete list of the population (sampling frame); may miss small subgroups by chance.
Stratified Random SampleGuarantees representation from each stratum; reduces variability when strata differ from each other.Requires advance knowledge of relevant strata; more complex logistics and analysis.
Observational StudyEthical when manipulation would be harmful; can study variables that cannot be assigned (e.g., smoking).Cannot establish causation due to lurking and confounding variables.
Randomized ExperimentRandom assignment balances confounders; can support causal conclusions.May lack generalizability if subjects are not randomly selected; ethical constraints limit feasible treatments.
✦ KEY TAKEAWAY
Think of data collection methods like different types of telescopes. A census is like pointing your telescope at every star individually—accurate but impractical for the whole sky. An SRS is like picking random patches of sky: efficient, but you might miss a rare galaxy. A stratified sample is like making sure you observe a patch in each quadrant of the sky—guaranteeing breadth. An experiment is different altogether: it is like creating conditions in a lab to test how stars would behave under new circumstances. The choice of instrument limits what you can discover.
SECTION 8

Connection to Statistical Inference

The data collection principles introduced here are not merely preamble—they form the logical foundation on which every subsequent inference procedure rests. When you construct confidence intervals or perform hypothesis tests later in the course, the validity of those procedures depends entirely on whether the data were collected using appropriate methods. The table below connects introductory concepts to their advanced counterparts.

How introductory data collection concepts underpin later inference topics
Introductory ConceptAdvanced Connection
Sampling variability — statistics differ from sample to sampleSampling distributions, Central Limit Theorem, and the basis for confidence intervals
Bias — systematic error in estimationConditions checks for inference (e.g., "random" condition); violations invalidate confidence intervals and tests
Random assignment in experimentsTwo-sample t-tests and chi-square tests for independence; causal interpretation of significant results
Confounding variablesSimpson's paradox, multiple regression, stratification in observational studies
Scope of inference (generalize vs. cause)Writing full inference conclusions that state both what can be claimed and to whom the claim applies

Every confidence interval you compute and every p-value you report implicitly assumes that the data were collected properly. A 95% confidence interval, for instance, has its advertised coverage rate only if the sample was drawn randomly from the population. If the sample is biased, the interval is centered on the wrong value, and no amount of mathematical sophistication can rescue the conclusion. This is why the AP Statistics exam devotes an entire unit to data collection before introducing a single inferential formula.

SECTION 9

Practice Problems

PROBLEM 1 — CONCEPTUAL
A researcher conducts a study by randomly selecting 200 adults from the city's voter registration list and measuring their blood pressure. She finds that the mean systolic blood pressure in her sample is 128 mmHg. Which of the following best describes the number 128 mmHg?
PROBLEM 2 — BASIC CALCULATION
A school has 1,200 students. The principal wants to survey a sample of 60 students about cafeteria satisfaction. She numbers all students 0001 through 1200 and uses a random number generator to select 60 unique numbers. Which statement correctly describes this sampling method and its implication?
PROBLEM 3 — INTERMEDIATE
A health magazine publishes a web survey asking readers, "Has taking daily vitamins improved your energy levels?" Of 15,000 respondents, 72% say yes. The magazine concludes that daily vitamins improve energy for most adults. Which of the following is the most serious concern about this conclusion?
PROBLEM 4 — APPLIED
A pharmaceutical company is testing a new migraine medication. From a database of 8,000 migraine patients at participating clinics, the company randomly selects 300 patients. These 300 patients are then randomly assigned: 150 receive the new medication and 150 receive a placebo. Neither the patients nor the administering physicians know which treatment each patient receives. After 12 weeks, the medication group reports an average of 3.2 migraine episodes per month compared to 5.1 for the placebo group. (a) Identify the type of study and explain how you know. (b) Can the company claim the medication caused the reduction in migraines? Justify your answer. (c) Can the company generalize the results to all migraine patients? Explain. (d) Describe one potential source of bias that could still affect this study, and explain its likely direction.
PROBLEM 5 — CRITICAL THINKING
A state legislature is debating whether to raise the minimum wage. To gather evidence, two research teams conduct separate studies. Team A: Examines data from 500 businesses in states that recently raised their minimum wage and compares their employment changes to 500 businesses in states that did not. Team B: Identifies 200 nearly identical small businesses (matched on industry, size, and region), then randomly assigns half to receive a government subsidy equal to the proposed wage increase (simulating the cost impact) and measures hiring changes over 6 months. (a) Classify each study and explain why they differ in the types of conclusions they support. (b) Team A finds that states that raised the minimum wage had 2% higher unemployment. A legislator says, "This proves minimum wage increases cause unemployment." Evaluate this claim. (c) Describe two specific confounding variables that could explain Team A's finding. (d) Explain why Team B's design, while stronger for causal inference, may still have limited external validity. Propose one modification that could address this limitation.
SUMMARY

Lesson Summary

Whether data "tell the truth" depends entirely on the method used to collect them. The distinction between a population and a sample is foundational: we almost always work with samples, and the quality of our conclusions depends on how those samples were drawn. A parameter is a fixed numerical truth about the population, while a statistic is its estimate from the sample. Two enemies of trustworthy data are bias (systematic error that cannot be reduced by collecting more data) and variability (random fluctuation that decreases with larger samples).

The three major families of bias— selection bias, measurement bias, and nonresponse bias—are combated through probability sampling, careful instrument design, and follow-up protocols, respectively. The scope of inference framework ties everything together: random selection allows generalization to the population, and random assignment allows cause-and-effect conclusions. A study may have both, one, or neither—and your job on the AP exam is to identify which forms of randomization are present and state the corresponding conclusions accurately.

Varsity Tutors • AP Statistics • Introducing Statistics: Do the Data We Collected Tell the Truth?