Sampling & experimental design
The idea
Statistics can only be as trustworthy as the data collection behind it. A census measures everyone; a sample measures a part to estimate the whole, and the gold standard is random selection, which gives every member a known chance of inclusion and lets sample results generalize to the population. Designs differ in ambition: observational studies watch what happens on its own, while experiments impose treatments using random assignment — and only experiments support cause-and-effect claims, because randomization balances out the lurking variables that otherwise supply rival explanations.
Bias is a systematic tilt that no amount of extra data can fix: convenience samples reach whoever is easiest, voluntary response samples attract the strongly opinionated, and leading wording nudges answers. Stratified sampling improves on simple randomness when the population has known subgroups — sampling each stratum in proportion guarantees every group its voice. The stubborn misconception is that a big sample cures bias; a million skewed responses just estimate the wrong number very precisely. Representativeness comes from HOW you select, never from how many you select.
Worked example
A principal wants to estimate average nightly homework time at a school of 1200 students: 400 ninth graders, 320 tenth, 280 eleventh, and 200 twelfth. Design a stratified random sample of 60 students proportional to grade size, and explain why it beats surveying 60 volunteers from the honor society.
- Compute each grade's share of the school and apply it to the sample of 60. Ninth grade: 400/1200 = 1/3 of the school, so 60 × 1/3 = 20 students.
- Repeat for the rest: tenth gets 320/1200 × 60 = 16, eleventh gets 280/1200 × 60 = 14, and twelfth gets 200/1200 × 60 = 10. Within each grade, draw the students at random from the full roster.
- Verify the design: 20 + 16 + 14 + 10 = 60, and each grade's share of the sample matches its share of the school, so no grade's homework habits can dominate by luck of the draw.
- Compare with the volunteer plan: honor society volunteers are a convenience sample with voluntary response on top — students who likely study more than average — so their mean would overestimate homework time no matter how carefully it were computed. Random selection, not sample size, is what earns generalization.
Answer. Randomly select 20, 16, 14, and 10 students from grades 9 through 12 respectively; the volunteer survey would be biased toward heavy studiers regardless of its size.
Check your understanding
- Why can random assignment in an experiment justify causal claims while even a careful observational study cannot?
- How does stratified sampling reduce the luck-of-the-draw imbalance that a simple random sample still allows?
- Why does enlarging a biased sample sharpen the estimate of the wrong value instead of fixing the bias?
- What sources of bias could still creep into a perfectly selected sample, and how might question wording be one of them?
Build the foundations first
Sampling & experimental design builds on these concepts. If any feel shaky, start there.