I will use a simple disease diagnosis example to show that even if we have a excellent prediction method based on strong evidences, we might still get poor predictions as long as the disease is rare in population.

Suppose there is a cancer diagnostic test with pretty high sensitivity and specificity of both 90%. If a man took the test and, unfortunately, got a positive result, can we say he is more likely to have cancer than other people?

When we are talking about someone is "more likely" to have cancer, we are comparing his chance of having cancer to that of people randomly selected from the population. This question seems so easy. Mathematically, we are asking the level of :

$$\frac{chance\ of\ people\ having\ cancer,\ given\ an\ evidence\ (positive\ test)}{chance\ of\ people\ having\ cancer,\ in\ general\ (given\ no\ special\ evidence)}$$ $$or$$ $$\frac{Pr(cancer | people\_with\_positive\_test)}{Pr(cancer | people\_randomly\_chosen)}\ \ \ (1)$$

The meaning of expression \((1)\) could be explained by the figure below.

We substitute expression \((1)\) with expression \((2)\) and \((3)\),

$$\frac{Pr(cancer | people\_with\_positive\_test)}{Pr(cancer | people\_randomly\_chosen)}$$ $$= \frac{\frac{C2}{C2+H1}}{ \frac{C1+C2}{C1+C2+H1+H2}} = \frac{\frac{C2}{C1+C2}}{ \frac{C2+H1}{C1+C2+H1+H2}}$$

We know that,

\(\frac{C2}{C1+C2} = Pr(positive\_test|cancer), \frac{C2+H1}{C1+C2+H1+H2} = Pr(positive\_test|people\_randomly\_chosen)\). So,

$$\frac{Pr(cancer | people\_with\_positive\_test)}{Pr(cancer | people\_randomly\_chosen)} = \frac{Pr(positive\_test|cancer)}{Pr(positive\_test|people\_randomly\_chosen)}\ \ \ (4)$$

We noticed that expression (4) has exactly the same format as Bayes' theorem :

\(Pr(Cancer|Test)=\frac{Pr(Test|Cancer)}{Pr(Test)}Pr(Cancer)\).

isn't it?

**In order to calculate the actual probability**, we remember that,

\(\frac{C2}{C1+C2}\) is, by definition, the sensitivity of the diagnostic test, while \(\frac{H2}{H1+H2}\) is the specificity.

$$\frac{Pr(cancer | people\_with\_positive\_test)}{Pr(cancer | people\_randomly\_chosen)} = \frac{\frac{C2}{C1+C2}}{ \frac{C2+H1}{C1+C2+H1+H2}} = \frac{sensitivity}{ \frac{C2+H1}{C1+C2+H1+H2}}\\

\begin{matrix}

= &\frac{sensitivity}{\frac{((C1+C2+H1+H2)*\frac{C1+C2}{C1+C2+H1+H2}*\frac{C2}{C1+C2})+((C1+C2+H1+H2)*(1-\frac{C1+C2}{C1+C2+H1+H2})*(1- \frac{H2}{H1+H2}))}{C1+C2+H1+H2}}\\

=&\frac{sensitivity}{(\frac{C1+C2}{C1+C2+H1+H2}*\frac{C2}{C1+C2})+((1-\frac{C1+C2}{C1+C2+H1+H2})*(1-\frac{H2}{H1+H2}))}\\

=&\frac{sensitivity}{Pr(cancer)*sensitivity+(1-Pr(cancer))*(1-specificity)}\ \ \ (5)\\

=&\left\{\begin{matrix}

1 & Pr(cancer)\rightarrow 1&&\\

&&& \\

\frac{sensitivity}{1-specificity} & Pr(cancer)\rightarrow 0&&

\end{matrix}\right.

\end{matrix}$$

\(Pr(cancer)\) is the same thing as \(Pr(cancer | people\_randomly\_chosen)\).

Now let's play with it :

The results shows that when sensitivity and specificity of diagnostic test are both 90% high, a man who tested positive is 8.3 fold more likely of having cancer than by chance. The absolute probability of him having cancer is 8.3%.

It's interesting to also try combinations of [99%, 99%, 1%] and [90%, 90%, 50%].

Now it's obvious that when making predictions,

**even the evidence is very strong**(represented by high sensitivity and specificity)

**we might still get a relatively low confidence**(probability of true positives)

**due to the low abundance of background positives**. When the absolute probability of true positive prediction is too low, we are unlikely to appreciate the fold change. The confidence we obtained from every piece of evidence could be diluted by the sea of negatives.

I thank the following articles for providing inspiration:

- Understanding Bayes Theorem With Ratios -- by BetterExplained
- An Intuitive Explanation of Bayes' Theorem -- by Yudkowsky

## No comments:

## Post a Comment