Conditional probability
The idea
Conditional probability measures chance under partial information: P(A | B), read as the probability of A given B, is the chance of A once you know B happened. The definition P(A | B) = P(A and B)/P(B) is a renormalization — B becomes the new universe, and you ask what fraction of it also contains A. Independence gains a precise meaning here: A and B are independent exactly when P(A | B) = P(A), so learning B changes nothing. Medical testing, spam filtering, and risk assessment all live on this idea.
Tree diagrams and two-way tables make conditioning mechanical: multiply along a branch for an and-probability, add branches for a total, then divide to condition. The misconception with real-world consequences is confusing P(A | B) with P(B | A) — the probability of a positive test given disease is not the probability of disease given a positive test, and the two can differ wildly when the condition is rare. Always ask which event is known and which is in question; the known one goes behind the bar.
Worked example
At a school, 60% of students play a sport. Among athletes, 30% also play an instrument; among non-athletes, 50% play an instrument. A randomly chosen student plays an instrument. What is the probability that this student plays a sport?
- Build the two branches that lead to an instrument player: P(sport and instrument) = 0.60 × 0.30 = 0.18, and P(no sport and instrument) = 0.40 × 0.50 = 0.20.
- Total the instrument players across both branches: P(instrument) = 0.18 + 0.20 = 0.38, so 38% of the school plays an instrument.
- Condition on what is known: P(sport | instrument) = P(sport and instrument)/P(instrument) = 0.18/0.38 = 9/19 ≈ 0.474.
- Interpret the shift: although 60% of all students are athletes, an instrument player is slightly more likely to be a non-athlete (10/19), because non-athletes take up instruments at a higher rate — the new information genuinely moved the odds.
Answer. P(plays a sport | plays an instrument) = 9/19, about 0.47.
Check your understanding
- Why does dividing by P(B) correctly rescale probabilities once B becomes the new universe of possibilities?
- How can P(A | B) and P(B | A) differ dramatically, and what real-world mistakes grow out of confusing them?
- What pattern would independence between two events create in a two-way table of counts?
- Why did learning that the student plays an instrument push the probability of being an athlete below the school-wide 60%?
Build the foundations first
Conditional probability builds on these concepts. If any feel shaky, start there.