## You Might Not Be Sick After All:

# You Might Not Be Sick After All:

#### Implications of Base Rates in Interpreting Medical Screening Tests Dr. Richard Platt

Imagine that you are living in South Florida and you decide to get tested for Zika. The test comes back positive. How likely is it that you actually have Zika? You may initially think that this is a simple question to answer; if you tested positive then you must have Zika. Right? However, as you think about it more, you realize that there is a possibility that there might be a false alarm. So you realize that it isn’t a certainty that you have the disease, and you also realize you need to know more about the test. You need to know how good the test is at detecting true instances of the disease and how good it is at avoiding false alarms when the disease is not present. As you think about it even more, you don’t even remember getting bitten by a mosquito recently and you know you haven’t been to any areas where Zika mosquitos have been found. Surely that must matter as well in determining the likelihood of having the disease. This reasoning has identified three things that need to be considered in determining the likelihood that you have Zika based on the result of a positive test. In order to determine the implications of the test outcome you need to know the test sensitivity and specificity and the base rate for the disease.

The likelihood of testing positive when you really do have the disease is the *sensitivity* of the test. It can be determined from the information in the *Has Disease* column of Table 1. In this example, the test is given to 100 people who have the disease and 99 of them test positive. That means that the test sensitivity is 99/100 or .99. The other characteristic of a test that is important is its *specificity.* The specificity of a test is the proportion of people who do not have a disease who test negative. This can be determined from the information found in the *Don’t Have Disease* column of Table 1. This column describes the test outcomes for 100 people who do not have the disease. If the specificity of a test was .97, that would mean that of these 100 people who did not have the disease, 97 of them test negative and 3 test positive. The 3 who test positive, even though they don’t have the disease, are false alarms. A good test should minimize the number of false alarms. The false alarm rate is the complement of the specificity of the test and so can be determined by 1 – specificity. In this case we would have a false alarm rate of .03 (1 - .97).

Table 1: Diagram of outcomes for a test with sensitivity = .99 and specificity = .97 administered to 100 people who have the disease and 100 people who do not have the disease.

It may seem like the sensitivity and the specificity of the test are all we need to know. The natural tendency is to focus on how good the test is and to conclude that because the test is high in both sensitivity and specificity it must be highly likely that a positive test means you have the disease. What this reasoning overlooks is the *base rate *of the disease within the population. The base rate is sometimes referred to as the prior probability. It is the probability of having the disease before you got the positive outcome to the test. The base rate is often ignored in interpreting the outcome of a medical test. In fact, it happens so frequently that it has a name: it is called *base rate neglect*. Base rate information is more general statistical information and people tend to ignore this more general information and focus instead on the new very specific information made available from the particular outcome to the test. However, in order to know whether it is likely that you have Zika given a positive test, we need to know what the base rate was. If you just came back from a trip to an area where Zika is widespread and you got bitten by mosquitos while you were there, then the likelihood is different than if you haven’t had much Zika exposure. This is because the base rates differ for these two populations.

`P(A|B) = (P(B|A)*P(A))/(P(B))`

Bayes' Theorem

Bayes Theorem (above) provides a normative model of how new information (the positive test) can be incorporated with prior probabilities (base rate for the population) to come up with an updated probability of having the disease. The **Bayes Theorem for Disease Testing Calculator** allows you to determine this updated probability when you provide the base rate in the population, the sensitivity of the test, and the false alarm rate. Let’s look at an example to see how this works.

The FDA documents sensitivity and specificity rates for tests it approves, including a variety of HIV tests. There are several rapid tests on which they provide data. One of those tests, called Reveal G2, has a .998 sensitivity and a .991 specificity. So it is a good test in terms of both sensitivity and specificity. What would a positive test using Reveal G2 mean? Although the sensitivity and specificity here are quite high, the implications of a positive test depend heavily on the base rate in the population being tested. Population base rates need to take into account risk factors that might be present for individuals within that population. People with risk factors for HIV may have rates of infection that are many times higher than those with no risk factors. For example, IV drug users who share needles may have infection rates of 1 in 10 or higher. On the other hand, for someone with none of the risk factors for HIV and living in an area where HIV rates are low, the base rate might be closer to 1 in 1000 or even 1 in 10,000 (based on CDC atlas of HIV diagnoses available here). The base rate has a dramatic effect on the updated probability taking into consideration a positive test result.

If we consider a population that has a 1 in 10 base rate and we administered the test to 10,000 people from that population, we would expect 1000 people to be infected with HIV and of those, 998 would test positive (1000 x .998). The false alarm rate would be 1 - .991 or .009. So out of the remaining 9000 uninfected individuals we would expect 81 false alarms (9000 x .009). So a total of 1079 (998 + 81) would test positive and 998 of those would be individuals who have the disease. That means that for this population, the probability of having the disease given that you tested positive is 998/1079 or .92. So in this case a positive test means it is highly likely that you have the disease.

Now let’s consider a population where the base rate is only 1 in 10,000. If we tested 10,000 people from this population, we expect there to be only one person with HIV. Because of the high test sensitivity, we can assume that that one person is going to test positive. 9,999 people remain who don’t have the disease. We would expect about 90 of them to test positive (9,999 x .009) because of the false alarm rate. So in this case we have 91 people test positive but only 1 of them has the disease. That means that for this population the likelihood of having the disease given a positive test is .011 (1/91).

Using the **Bayes' Theorem for Disease Testing Calculator** you can see the impact of changes to any of these dimensions. It will calculate the probability using Bayes Theorem but it will also describe the outcome in terms of frequencies, like we have done above, so that the numbers behind the calculation are more transparent. The probabilities given different base rates are found in Table 2.

Table 2: The Effect of Base Rate on the Probability of Having HIV given a Positive Reveal G2 Test

Going back to our original question regarding the Zika virus, it should be clear that we need to be able to estimate the base rate of the disease, as well as the sensitivity and specificity of the test in order to answer our question. The CDC reports that the test that is being used to detect Zika is being used under a FDA Emergency Use Authorization because there is no test that has completed FDA approval. Consequently, the exact sensitivity and specificity of the test is not available. However, the CDC does note that false alarms are possible. At the moment (September 2016), in South Florida, the base rate is likely to be very low if we look at the entire population. However, like we did with the HIV example, we need to ask what the relevant risk factors are. Things like having spent time in one of the areas where Zika has been found, having been bitten by mosquitos, and having Zika-like symptoms would be some of the main risk factors that you would want to consider. So if you don’t have symptoms, don’t even think you have been bitten by a mosquito, and you know you haven’t been in the area where mosquitos that carry Zika are known to be, then your base rate is going to be vanishingly low. Consequently, if you received a positive test, it is still more likely to be a false alarm than an indicator of an actual infection. Knowing this the CDC only recommends the test for those who have symptoms or pregnant women who may have been exposed. However, the take home message here is not about Zika, HIV or any other particular disease.

The important point is that population base rates matter, and they matter quite a lot in determining the implications of a test outcome. It isn’t always easy to determine what the appropriate population is or what its base rate might be, but considering the impact of various base rates can help one approximate a range for the probability given a positive test. Mass testing for rare diseases will almost always produce more false alarms than correct detections. Consequently, tests that are used to screen for diseases with low base rates need particular scrutiny. Depending on how rare the disease actually is in the population, even a very low false alarm rate could still produce many more false alarms than correct detections. So remember if you are having testing done, you will probably want to ask about the sensitivity and specificity of the test being used, but whatever you do, make sure you also ask about the base rate!