Bayes’ Theorem

Bayes’ Theorem describes the probability of an event, based on prior knowledge that may be related to that event. The theorem was named after Reverend Thomas Bayes, who put forward an equation that allows new evidence to update beliefs.

The Bayes’ Theorem is the basis of the Bayesian variety of Machine Learning.

Before we dive into Bayes’ Theorem, lets define some terms:

Posterior probability (often called posterior)
the statistical probability that a hypothesis is true calculated in the light of relevant observations.

Prior Probaility (often called prior)
the probability that an event will reflect established beliefs about an event before the arrival of new evidence or information.

In general terms, Bayes’ Theorem states that we have a prior probability to begin with and as a result of running a test we obtain some new evidence. The prior probability combined with the new evidence results in a posterior probability.

prior probability + evidence -> posterior probability

Lets say we know that the prior probability of event H occurring is 2%. The prior belief about H is written as P(H) to stand for “the probability of H”.

P(H) = 0.02 = 2%

To make this example more concrete lets say that P(H) is the probability of a particular disease occurring in a population.

We now look for evidence (E). Bayes’ Theorem answers precisely what is the revised (posterior) belief about H given the evidence (E). This is written as P(H | E) meaning “the probability of the hypothesis H given the evidence E”.

So in our case Bayes’ Theorem would answer what is the probability of having the disease given that the test shows the patient is positive for the disease.

In order to be able to answer this we also need to know what is our test sensitivity P(E | H). That is, what is the chance of having a positive test given I have the disease. Lets say for the example that P(E | H) is 0.9 or 90%.

Give this we can then provide the posterior by saying

P(H | E) = ( P(H) . P(E | H) ) / P(E)

The denominator in this formula, P(E), is the probability of the evidence irrespective of our knowledge about H. Since H can be either positive or negative, it is also the case that

P(E) = P(E | H) * P(H) + P(E | not H) * P(not H)

So the full Bayes’ Theorem is:

P(H | E) = ( P(H) . P(E | H) ) / ( P(E | H) * P(H) + P(E | not H) * P(not H) )

For our example we know that P(H) is 0.02. We also know that P(E | H) = 0.9.
In our example suppose we start with P(H)=0.4, then, since we know P(E | H) = 0.1 it follows that the numerator, P(H) . P(E | H), is 0.018.

This leaves term P(E | not H). The probability of getting a positive test result given I do not have the disease. It is reasonable to assume this probability is equal to 0.1.

Hence the denominator is equal to 0.9*0.02 + 0.1*0.98 which is 0.116. Since the numerator was 0.018 we conclude finally that P(H | E) is equal to 0.018 divided by 0.116, which is 0.1552.

So from our starting belief that the probability of a disease being 2%, once we know that a positive test has been returned for the patient we can now revise our belief, P(H | E), equal to 15.5%. The evidence has clearly had a significant impact on out belief.

This entry was posted in Machine Learning, Programming. Bookmark the permalink.