Skip to content

I'm currently reading Crash Proof by Peter Schiff

Stewart

Hi, I'm Stewart. It's nice to meet you.

This is my website. It's a collection of my unqualified thoughts, and ones about ethical philosophy in particular. No one pays me for that sort of thing, though, so during the day I work as a consultant / web developer.

I live in Boston with my wife, Lauren, and our cats, Dory and Pekoe.

False Positives

I love probability puzzles. I’m not sure I can explain it better than that, except to say that there’s something really fun–almost exciting–about these sorts of things. Studies indicate that probability and statistics are things which humans are normally quite bad at. It’s only after training with them for some time that we can get used to their complexities. Having not trained with them at all, my personal experience bears that out. It’s refreshing to play around with the subject, though, to feel my mind stretch around it trying to understand something which is nearly incomprehensible to me.

The most recent example I’ve found was in a fantastic TED Talk given by Oxford Mathematician, Peter Donnelly. I’ll embed the video below, if you’d like to watch it yourself. One of the situations that Donnelly presents is this: Suppose we have a test which screens for some disease, and which has a 90% rate of accuracy. If we select someone at random and administer the test, and it gives a positive result, then what is the likelihood that this person carries the disease in question?

Whenever someone presents you with a question like this, you must take the first, intuitive answer that you come up with and ignore it completely. These are trick questions, after all. So I paused the playback on my iPod and spent the next ten minutes trying to work this one out as I walked to my office. Try as I might, however, I couldn’t come up with any answer but the obvious 90%. I felt that possibly there wasn’t enough information to answer the question, but I couldn’t figure out why anything beyond the accuracy rate could be relevant. After all, if a test has a 90% accuracy rate, that suggests it will correct nine times out of ten. Regarding how likely it’s results are to be right, I felt that the stated accuracy rate should answer that question. Otherwise what is the point of an accuracy rate? These are trick questions, however, so I had no illusion about being right.

It turns out that my answer was wrong (of course), but my suspicion was correct. There isn’t enough information to answer the question, because in order to correctly evaluate it we have to also know how how common the disease itself is. Hearing Donnelly explain that, I felt that it made sense but still couldn’t quite understand why. He explained that the likelihood of a positive result being correct has to be weighed against the likelihood of it being false. To illustrate this, he suggested that in a population of one million we might have a 1% infection rate, which amounts to 10,000 people carrying the disease. If you test a random person from that group, and your test has a 90% accuracy rate, then one out of every ten test will provide a bogus result. The implication is that 99,000 uninfected people (10% of 99% of one million) will test positive, even though only 10,000 total people carry the disease. And if my math skills, which are typically quite poor, can work this out, that means the likelihood of an accurate positive result is 89.899%, just slightly less than the indicated 90% accuracy rate.

Huh? 89.899%? Isn’t that basically the same thing? If the close numbers confuse you, as they still confuses me, try working it out with much larger, clearer statistics:

Suppose we have a population of 10 billion people. And suppose we know that, among those people, there are only ten individuals who carry some specific disease. If we have a test for that disease, and it’s 90% accurate, then what are the odds that a single, random individual from the population will actually be infected if his test returns a positive? It can’t be 90%, right? While the odds that any given test is correct may be nine out of ten, that information has to be weighed against the astronomically low probability that any random person actually has the disease, which is 1 out of 1,000,000,000. If you tested every single person, you’d be likely to identify nine of the ten infected people, but you would also get about one billion false positives.

If that doesn’t make the issue clearer for you, suppose that those ten infected people eventually die under quarantine. The disease will no longer be present anywhere in the population, though the test remains unchanged. If we continue to administer the test, we’ll get one positive result out of every ten tests, but there will now be zero possibility of any positive result actually being correct. Clearly the number of people who are infected matters a great deal when evaluating how likely it is that a test result is to be reliable.

In his lecture, Donnelly gives a real-world example of this sort of misunderstanding, wherein a woman was convicted of murdering her children due to this very common misunderstanding of probability. Eventually the conviction was overturned, but it’s a frightening case nonetheless. Check out the video below.

Comments

None so far.


Markdown formatting is enabled here