Probability theory
The idea
Probability theory makes uncertainty computable by building on three axioms: probabilities are nonnegative, the whole sample space has probability 1, and probabilities of mutually exclusive events add. From counting rules and conditional probability — which you have already used — the theory assembles the machinery of serious inference: the multiplication rule P(A and B) = P(A)P(B given A), the law of total probability for averaging over scenarios, and Bayes' theorem for reversing the direction of a conditional.
Bayes' theorem deserves a mental picture, not just a formula: when an effect can arise from several causes, the probability of a particular cause given the observed effect is that cause's share of all the ways the effect happens. Weight each scenario by how often it occurs and how often it produces the effect, then divide the share of interest by the total.
The misconception that wrecks the most real decisions is confusing P(A given B) with P(B given A) — the probability of a positive test given disease is not the probability of disease given a positive test. The two can differ wildly when the base rates of the underlying scenarios are lopsided, which is precisely when intuition is least trustworthy.
Worked example
A factory runs two production lines. Line A makes 60% of all units and 2% of its output is defective; line B makes 40% and 5% of its output is defective. A randomly chosen unit turns out to be defective. What is the probability it came from line B?
- Set up the scenarios and the target: the causes are A and B with P(A) = 0.6 and P(B) = 0.4, the effect is the event D (defective), and the question asks for P(B given D) — note this reverses the given conditionals, which point from line to defect rate.
- Compute each route to a defect with the multiplication rule: P(A and D) = 0.6 × 0.02 = 0.012, and P(B and D) = 0.4 × 0.05 = 0.020.
- Total the routes with the law of total probability: P(D) = 0.012 + 0.020 = 0.032 — overall, 3.2% of all units are defective.
- Apply Bayes' theorem as a share: P(B given D) = P(B and D)/P(D) = 0.020/0.032 = 0.625.
- Interpret: although line B makes only 40% of the units, it accounts for 62.5% of the defects, because its defect rate is 2.5 times higher. A concrete check with 1000 units: 12 defects from A and 20 from B gives 20 out of 32 — the same 0.625.
Answer. The probability the defective unit came from line B is 0.020/0.032 = 0.625, or 62.5%.
Check your understanding
- Why can P(B given D) be so different from P(D given B), and which real decisions hinge on not confusing them?
- How does the law of total probability act as the denominator inside Bayes' theorem?
- What happens to the answer if line B's share of production shrinks toward zero while its defect rate stays fixed?
- How would you rebuild this calculation as a tree or a table of 1000 imaginary units, and why does that recasting help intuition?
Build the foundations first
Probability theory builds on these concepts. If any feel shaky, start there.