Bayes' Theorem for Disease Testing

Processing...

Bayes' Theorem, `P(A|B) = (P(B|A)*P(A))/(P(B))`, computes the probability of event A occurring if event B is true. This can be especially useful in the field of medicine and diagnosis of rare diseases because many people misinterpret disease statistics. In disease diagnosis (A) represents having the disease and (B) represents testing positive for the disease. Bayes’ Theorem can answer the question “What is the probability that you have a disease given that you have tested positive for it?” P(A?B)

Interpreting Statistics Online

In diagnosis, a positive test does not necessarily mean that you have the disease. In fact, with mass testing for relatively rare diseases, it may still be more likely that you don’t have the disease even if you have tested positive for it. The following example illustrates why.

Example: Dinosaur disease

Imagine that 1 out of every 10,000 people has a hypothetical disease that we will call dinosaur disease, and that the diagnostic tests used for dinosaur disease have a 95% accuracy rate. That means that people who have the disease will test positive 95% of the time. However, 2% of the time, the tests report false positives, meaning that 2% of the people who don’t have the disease will nevertheless test positive for it. It is important to remember the base rate - in this case, 1 in 10,000.

Using Bayes' Theorem to represent the events, P(A) is the simple probability of having dinosaur disease. Based on the base rate we know that P(A) = .0001. P(B|A) is the probability of testing positive given that you have disease P(B|A) = .95. The probability of a false positive can be represented by P(B|not A), because it is the probability of testing positive when dinosaur disease is not present P(B|not A) = .02. P(B) can be found by adding the probability of testing positive and having the disease and the probability of testing positive and not having the disease. P(B) = P(B|A)*P(A) + P(B|not A)*P(not A). So in this case, P(B) = .95*.0001 + .02*.9999 = .02, which means there is a 2% chance of testing positive for dinosaur disease. This makes sense, because its occurrence is rare (1 out of every 10,000). When the formula is completed, P(A|B) represents the probability of having dinosaur disease, given that the test was positive. So `P(A|B) = (P(B|A)*P(A)) /(P(B))` = `(0.95*0.0001) /0.02` = 0.0047, meaning the chance of having dinosaur disease if the test is positive is about half of one percent!

Since dinosaur disease is so rare (1 out of 10,000), the number of false positives is much higher than the number of true positive diagnoses. People often overlook the base rate and just look at how accurate the test is. Even with a highly accurate test if the base rate is very low, then there are likely to be more false alarms than true positives.

References

Su, F. (2010). Medical tests and Bayes’ Theorem. Math Fun Facts. Retrieved from https://www.math.hmc.edu/funfacts/ffiles/30002.6.shtml

Stone, J. V. (2012). Vision and brain: How we perceive the world. Cambridge, MA: MIT Press Books.

Wilcoxon Signed Rank Test: Enter two sets, whether it's a one or two tail test and an alpha value to see the Wilcoxon statistic and the critical value.
Bayes' Theorem for Disease Testing: Enter a base rate probability, probability of false positives and the probability of correct positives to see a ratio of people with the disease, approximate number of false and true positives and the theorem's percent likelihood of a having the disease if tested positive.
chi-square Test: Enter a 3x2 matrix to see the expected values matrix with row and column totals, degrees of freedom and the chi-square value.
Rescorla-Wagner Formula (alpha and beta version): Enter salience for conditional stimuli, rate of unconditional stimuli, maximum conditioning for unconditioned stimuli and the total associative strength of all stimuli present to see the change in strength between conditional and unconditional stimuli.
Rescorla-Wagner Formula (k version): Enter Maximum conditioning possible for the unconditioned stimuli, total associative strength of all stimuli present, combined salience of the conditioned and unconditioned stimuli, and number of trials to see the change in strength associated with the trials.
Ricco's Law: Enter the area of visually unresolved target and constant of background luminance when eyes are adapted to see Ricco's Law factor.
Ricco's Law (K variable): Enter the scotopic vision constant, background luminance and photopic vision constant.
Stevens' Power Law: Enter proportionality constant, magnitude of stimulation, type of stimulation exponent to see magnitude of sensation.
Weber Fraction: Enter just-noticeable difference for intensity and stimulus intensity to see the weber fraction.
Weber-Fechner's Law: Enter just-noticeable difference for intensity, instantaneous stimulus, stimulus intensity and the threshold to see the factor.
Random Integer: This provides a random number (integer) between a lower and upper bound.
Observational Statistics (aka Simple Stats): Observational statistics on a set including: count, min, max, mean, median, mode, mid-point, range, population and sample variance and standard deviation, mean absolute deviation, standard deviation of mean, sum of values, sum of squared values, square of the sum, and the sorted set.
Frequency Distribution: Frequency distribution of a set of observations in uniformly sized bins between a minimum and maximum.
Least-squares Trend Line (aka Linear Regression): Linear regression line on a set of paired numbers and see (r) the correlation coefficient,(n) number of observations, (μX) mean of the X values, (μY) mean of Y values, (ΣX) sum of the X values, (ΣY) sum of the Y values, (Σ(X⋅Y) ) sum of the X*Y product values, (ΣX²) sum of X² values, (ΣY²) sum of Y² values, (a) y intercept of regression line, and (b) slope of regression line.
Single-Sample t-test: t-Test parameters including alpha level, population mean and whether it's one or two tailed and see the degrees of freedom, critical t-value, t score and the standard error.
Paired Sample t-test: Test of two sets of values with an alpha level and whether it's one or two tailed and see the number of observations, mean and standard deviation for both sets, the degrees of freedom, critical t-value, t-score and the Standard Error value.
Effect Size (r-squared): Enter a t-test result and the degrees of freedom to see r².
Effect Size (Cohen's d): Enter the mean from two groups and the estimated standard deviation to see the effective size.
Analysis of Variance (one way): ANOVA for numeric observations of three groups. Computes the F Score, Numerator: degrees of freedom Between, Denominator: degrees of freedom Within, mean of each group, grand mean, total sum of squares, sum of square within and between, and variance within and between.

Interpreting Statistics Online

Example: Dinosaur disease

References

See Also