BAYES’ THEOREM: A VISUAL INTRODUCTION

BAYESTHEOREM.NET

BAYES’ THEOREM: A VISUAL INTRODUCTION

Welcome to BayesTheorem.net: A Visual Introduction for Beginners. This website is packed with examples and visual aids to help clarify what Bayes’ Theorem is and how it works. At its core, Bayes’ Theorem is very simple and built on elementary mathematics.

Before we dig into different definitions, it needs to be stated that Bayes’ Theorem is often called Bayes’ Rule, Bayes’ Formula or Bayesian Probability. So, don’t be confused – they are the same, and we will be using both theorem and formula through out this website.

Second, we need to make it clear that Bayes’ Theorem is a law of probability theory. It helps us work with, revise, and understand probabilities when we are presented with new evidence. Practically speaking, the theorem helps us quantify or put a number on our skepticism and make more informed rational choices. It helps us answer the following:- You just had a test for cancer and it came back positive. What is the probability that you have cancer if the test is positive?
- Your friend has a new dog and when you visit she slobbers all over you, but does that mean the dog likes you? What is the probability that the dog likes you given that she licks you?
- Your friend claims that stock prices will decrease if interest rates increase. What is the probability stock prices will decrease if interest rates increase?

If you’ve recently searched Google, Bayes’ Theorem was used to display your search results. The same is true for those recommendations on Netflix. Hedge funds? Self-driving cars? Search and rescue? Bayes’ Theorem is used in all of the above and more.

At its core, Bayes’ Theorem is a simple mathematical formula that has revolutionized how we understand and deal with uncertainty. If life is seen as black and white, Bayes’ Theorem helps us think about the gray areas. When new evidence comes our way, it helps us update our beliefs and create a new belief.

*Ready to dig in and visually explore the basics? Let’s go!*

- Part 1 – Bayes’ Theorem Simple Definition 4 Different Ways
- Part 2 – Bayes’ Formula Components Explained and Defined in Depth
- Part 3 – 2 Bayes Theorem Examples
- Part 4 — Problem 1: The Flu
- Part 5 — Problem 2: The Breathalyzer
- Part 6 – Additional Bayes’ Theorem Resources – Books, Videos and More

Here are four ways Bayes’ Theorem can be explained.

**ONE –**Bayes’ Theorem helps us update a belief based on new evidence by creating a new belief.**TWO –**Bayes’ Theorem helps us revise a probability when given new evidence.**THREE –**Bayes’ Theorem helps us change our beliefs about a probability based on new evidence.**FOUR –**Bayes’ Theorem helps us update a hypothesis based on new evidence.

*The only problem?* Applying the theorem is not intuitive, at least not for most people. This is where visualizing a problem that entails using Bayes’ Theorem can be a BIG HELP. When working with small amounts of data there are a few different visual aids you can use: In this website, we’ll be using Venn Diagrams and Decision Trees. **Venn diagrams** (shown below) are an excellent way to help us visually understand and solve abstract problems.

**Decision trees** are a great tool that can help us solve problems where probabilities are not provided and must be discovered.

If you are confused with the concept of Bayes’ Theorem, this is a fantastic place to start. Before we dive into Section 1, let’s take a look at Bayes’ Theorem without using the formula.

To begin, let’s draw a rectangle. Don’t get hung up on the shape – it could be any shape, but a rectangle is easy to work with. The area inside the rectangle represents all possible outcomes for our experiment. For example, if there are two possible outcomes that are equally probable, we would divide the rectangle into two equal halves. This means that each outcome has a 50% likelihood of occurring. Or, if there are three possible outcomes that are all equally probable, we would divide the rectangle into equal thirds. This would mean that each outcome has a 33.33% chance of occurring.

For this example, we are going to stick with two equal outcomes and title each outcome A and B, respectively.

Now, imagine that each probability represents a small cardboard box. Box A is filled with 10 chocolate chip cookies. There is nothing else in Box A except home baked, warm, mouth-watering chocolate chip cookies. To demonstrate this, we will shade in Box A. In Box B there are also cookies, but there are two different types. There is an even mix of 5 peanut butter cookies and 5 chocolate chip cookies. To demonstrate this, we will draw a line and cut the box in half, and then shade in the chocolate chip cookies in the bottom half of the rectangle. We will leave the top half blank.

Let’s step back and look at the rectangle. Can you see the chocolate chip cookies in the shape of an L? Those areas represent all of our chocolate chip cookies in both boxes, while the white area represents the peanut butter cookies.

Now, what if you were to close your eyes and have both boxes placed in front of you and shuffled, and then reach out your hand and select a cookie? Let’s say you did this and when you opened your eyes, you saw that you had selected a chocolate chip cookie.

If you had to guess what box the cookie came from, what box would you select? Many people would select Box A, and we’ll take a hunch that you are one of them. But let’s take a closer look at why this is. Both Box A and B have chocolate chip cookies, but Box A has exactly double the amount of chocolate chip cookies than Box B. Within a split second your brain assessed this and came away with the conclusion that Box A has a greater probability of being selected than Box B. Within a split second, you quickly became more confident in one probability versus another. One box versus another. And then, you made a decision.

Here’s the magic! This calculation is a very basic, natural use of Bayes’ Theorem. Given evidence (the amount and type of cookies in each box), you were able to quickly come to the conclusion that Box A has a greater probability of being selected than Box B.

Now, let’s step back once more. When your hand selected a chocolate chip cookie, something disappeared: the probability of selecting a peanut butter cookie is now gone. So, to visualize this, let’s wipe away the portion of Box B that represents the peanut butter cookies.

Here’s the magic! This calculation is a very basic, natural use of Bayes’ Theorem. Given evidence (the amount and type of cookies in each box), you were able to quickly come to the conclusion that Box A has a greater probability of being selected than Box B.

Our boxes are now in the shape of an L, and we can also see that there are double the amount of cookies in Box A than Box B. In fact, if we were to break the Boxes apart into equal sections, we would have 3 areas: 2 sections in Box A, and 1 section in Box B.

By looking at this, we can see that Box B has a probability of ⅓, or ~33% of being selected. Box A has a probability of ⅔, ~66% of being selected. This difference in probability is what your brain roughly calculated before and the whole reason why you selected Box A. Your Brain looked at ~33% vs ~66% and selected the highest percentage, which comes from Box A. * The ~ symbol means approximately.

What we have just done is demonstrate the concept of Bayes’ Theorem and solve a problem all without using the formula. Now, before we solve this same problem with the formula, it might be helpful to define the formula and its components, or as we call them ingredients. But, if you’d prefer to go straight to solving with the formula click here.

The formula for Bayes’ Theorem is shown below. As you can see, there are three components to it. We find it helpful to call these components ingredients and think of the answer as all of the ingredients combined. For every question you come across, you’ll need to find each ingredient and plug it into the formula.

**Part 1 – Basic Bayes Formula Definitions To Get You Started**

- The vertical bar | stands for ‘given that’.
- P stands for Probability.
- A & B are Events.
- P(A) and P(B) are the probabilities of events A and B. Each event is separate from the other and does not impact the other.
- P(A|B) is the probability of event A being true given that event B is true.
- P(B|A) is the probability of event B being true given that event A is true.

Using the above definitions, the entire formula can be read as follows:

The formula as a whole is built using basic algebra. It might look complicated but it is actually quite user-friendly. Every time you use the formula all you need to do is remember the three ingredients, find them, plug them into the formula, and voila. You will then have an updated probability based on new information; you’ll have P(A|B), which is technically called the Posterior Probability and is a normalized weighted average.

**Part 2 – A Visual Introduction To Get You Started**

In Part 1 of the Visual Introduction, Bayes’ Theorem was demonstrated visually without using its formula. Now, in Part 2 we’ll see how we can derive the same numbers using the formula. To refresh your memory, we had two boxes of cookies in front of us. One box was filled with 10 chocolate chip cookies. The other box had 5 chocolate chip and 5 peanut butter cookies. We then closed our eyes and picked a cookie out of a box, and when we opened them back up we had selected a chocolate chip cookie.

After doing this we discovered the following:

- There is a ~66% probability we chose the cookie from Box A
- There is a ~33% probability we chose the cookie from Box B

This time, let’s follow 4 steps to finding the answer using the Bayes’ formula. We will find the answer for Box A first, and then deduce from this to find the answer for Box B.

**Step 1:**To start, we always need to determine what we are wanting to find.

We want to know the probability of Box A given that we selected a chocolate chip cookie.**Step 2:**Write what you want to find as a formula.**Step 3:**Find each ingredient. Then, plug it in.

- P(Box A) = .5 * To answer this we ask the following: What is the probability of drawing from Box A? Remember, this probability is independent of all other events. Since there are only two boxes and the probability of selecting from either is equal, the answer is .5
- P(CC Cookie) = .75 * To answer this we ask the following: What is the probability that we will select a chocolate chip cookie? Remember, this probability is independent of all other events. There is 20 cookies total in both boxes, and 15 of them are chocolate chip. So, 15/20 is .75
- P(CC Cookie | Box A) = 1* To answer this question, we ask the following: What is the probability of selecting a chocolate chip cookie given that we have selected from Box A? Since there are only chocolate chip cookies in Box A, the probability is 1. * A probability of 1 represents a 100% probability of something occurring.

**Step 1:** To start, we always need to determine what we are wanting to find.

We want to know the probability of Box A given that we selected a chocolate chip cookie.

** Step 2:** Write what you want to find as a formula.

Step 3:

Now, we can plug each ingredient into the formula:

**Answer:** We now know that there is a ~66% probability that we selected from Box A given that we have a chocolate chip cookie. To find the probability of selecting Box B, we can follow steps 1-3 again by replacing the term Box A with Box B. Or, we can simply deduce from our answer that if there is a ~66% probability Box A was selected, there must be a ~33% chance Box B was selected. Since all probability adds up to 1, we can discover this by doing the following: 1-.66 = .33, or ~33%.

**Solve for One Possible Outcome – With All Data Provided**

In this section, we use Venn diagrams to visualize our problems so that they are easier to understand and solve. If reading the term Venn diagram makes you shudder, Wikipedia provides a good overview to get you up to speed. This article at Stanford University is also helpful, but you really don’t need to worry. We explain everything step-by-step.

When searching for a probability we are sometimes given all of the components or ingredients, and simply need to A) Identify each one B) Label each one, and C) Plug each one into Bayes’ formula. Hands down these are the easiest questions to apply Bayes’ formula too. In this section, we will be dealing with these types of probabilities using Venn diagrams. Venn diagrams work great for these by helping us visually understand the question.

Here are a few tips as you approach each question:

- Try not to get overwhelmed or lost in the question. Always begin by writing down what you want to discover.
- Try not to confuse P(A|B) with P(B|A). Always double check your numbers! This is a common error (technically called a Base Rate Fallacy).
- Remember to take your time. There is no need to rush.

Let’s say that you are at work one day and have just finished lunch. You suddenly feel horrible and find yourself lying down and within a few minutes begin to panic. Wasn’t your friend at work recently sick with the flu? What if you have it? Will you have to cancel your big trip next week?

You have a headache and sore throat, and you know that people with the flu have the same symptoms roughly 90% of the time. In other words, 90% of people with the flu have the same symptoms you currently have. Does this mean you have the flu?

Wanting to gain a little more information you roll over, grab your phone and search Google. You find a reputable article that says that only 5% of the population will get the flu in a given year. Ok. So, the probability of having the flu, in general, is only 5%.

You then spot one more statistic that says 20% of the population in a given year will have a headache and sore throat at any given time. After reading this you throw your phone down and curl up in your seat. You’re completely overwhelmed and more confused than you were to start. Do you have the flu? What should you do?

Let’s break this scenario apart.

You then spot one more statistic that says 20% of the population in a given year will have a headache and sore throat at any given time. After reading this you throw your phone down and curl up in your seat. You’re completely overwhelmed and more confused than you were to start. Do you have the flu? What should you do?

First, let’s remember what Bayes’ Theorem does: it helps us update a hypothesis based on new evidence. In this scenario, your hypothesis is that you have the flu and your evidence is your headache and sore throat. Now, after seeing that 90% of people with the flu have your symptoms, many of us would stop and conclude that we have the flu. We would look at the 90% statistic and sigh, resolved to the fact that we likely have the flu. This reaction is very common and called Base Rate Fallacy or Base Rate Neglect. The CIA has a nifty article on this, and it explains how people often gravitate towards the easiest information available when making decisions.

So, we are left wondering. Is our assumption based on the 90% statistic right?

This is where Bayes’ Theorem comes in and helps us have a clearer picture. By using the theorem, we are forced to look at all data and update our hypothesis with new evidence. In the scenario, we are given two additional pieces of information that can help us come to a more precise probability of having the flu given our symptoms.

Let’s review all the information we do have before moving on.

This is where Bayes’ Theorem comes in and helps us have a clearer picture. By using the theorem, we are forced to look at all data and update our hypothesis with new evidence. In the scenario, we are given two additional pieces of information that can help us come to a more precise probability of having the flu given our symptoms.

- We know people with the flu have a headache and sore throat roughly 90% of the time.
- We know the probability of having the flu, in general, is only 5%.
- We know that 20% of the population in a given year will have a headache and sore throat at any given time.

To start, we always need to determine what we are wanting to find. We want to know what the probability is of having the flu given our current symptoms. Now that we know what we are solving for, we are going to tackle this problem two ways. Depending on how you learn you may prefer one over the other, and that is ok. People learn differently, and that is why we included both options.

To visualize the problem, we’ll draw two circles and merge them into a Venn diagram.

**Circle #1:** The area inside this circle represents all possible outcomes. In this example, the area represents all people who could get sick with the flu – in other words, the entire population. The shaded circle labeled “A” represents the 5% of the population who have the flu. Let’s step back now. What does this exactly mean? Within the circle is the entire population, and there are two possible outcomes for the population: people can have the flu, or not have the flu.“A” is an event, and its probability is 5%. This probability is represented in our formula as P(A).

**Circle #2:** The area inside this circle also represents all possible outcomes. In this instance it represents all people who could have the symptoms – this is the entire population. The shaded circle labeled “B” represents the 20% of the population that does have the symptoms. What this means is that within the entire circle there are two possible outcomes: people have the symptoms or do not have the symptoms. “B” is an event, and its probability is 20%. This probability is represented in our formula as P(B).

**Circle #3:** In this circle, we have combined both events “A” and “B” – and this is where the magic happens!

Here is a quick breakdown of how you can read this:

- The white area inside this circle represents people who do not have either the flu or the symptoms.
- The area where only Circle A covers shows us people who only have the flu.
- The area where only Circle B covers shows us people who only have the symptoms.

Now, take a look at Circle B and see where it overlaps with Circle A. This is what we are really interested in! This is our question from Step 1 in visual form. We want to know the probability P(A|B) of having the flu given our symptoms. This probability is found where both events occur together and is called an intersection. Another way to look at it is like this: if we are in area B, what is the probability we are also in area AB (where A and B overlap)?

With both circles now merged, we can visually see our question and what we are trying to solve for. Although we won’t be solving the question with a Venn diagram, the diagram does help us visualize what we are trying to understand. If P(A) is the probability of you having the flu, and P(B) is the probability of you having your symptoms, what is the probability of you having both? While we don’t yet know the actual answer, we can clearly see what we are trying to solve for.

Now let’s solve the problem by using Bayes’ formula. For the sake of ease, we’ll begin by re-stating what we want to find.

**Step 1:** Determine what you want to find. Again, we are solving for the same thing we did above with the Venn diagram but are restating this for clarity. We want to know what the probability is of having the flu given our current symptoms.

**Step 2:** Write the above as a formula. Let’s translate what we are solving for into the formula. In other words, we’ll bring the language of Step #1 above into the formula.

Here is Bayes’ formula:

Now, let’s translate with what we are solving for.

**Step 3:** Find each ingredient and label it. From the scenario we know the following: *We have changed the ingredients provided in the scenario from percents into decimals. We will do this every time before we begin to plug the ingredients into the formula.

- P(A) – In our formula, this ingredient is represented as P(Flu) and answers the question: What is the probability of you having the flu? This number is .05.
- P(B|A) – In our formula, this ingredient is represented as P(Symptoms | Flu). This number is .9.
- P(B) – In our formula, this ingredient is represented as P(Symptoms) and answers the question: What is the probability of you having the symptoms? This number is .2.

**Step 4:** Plug each ingredient into the formula and solve.

**Conclusion:** So, after plugging each ingredient into the formula our answer is 22.5%. We can conclude from this that if you have a sore throat and headache you only have a 22.5% probability of having the flu. Wow! Now, remember what Bayes’ Theorem does: it helps us update a hypothesis based on new evidence. Originally, we thought that the probability of having the flu was as high as 90%! This belief was based on our latching on to P(B|A). However, our answer P(A|B) is very different! The 22.5% we ended with is more accurate than the 90% probability we started with.

This problem is a fantastic illustration of the power that Bayes’ Theorem can give us when facing tough uncertainties. It is also a tweaked example of a questionnaire given to 1000 gynecologists. In the study, only 21% of gynecologists chose the correct answer while almost 50% chose the equivalent of our 90%! If you’d like to read more on this, Cornell University has a fantastic article.

You are a police officer in Baltimore and it’s New Year’s eve. As usual, roadblocks are set-up at various points throughout the city to combat drunk driving. Throughout the evening you and your fellow officer are giving random drivers breathalyzer tests. To your surprise, the night is going well and you’ve had few incidents. Around 2 am you randomly pull over a vehicle and have the driver take a breathalyzer test, and the result is positive. You assume the test is accurate and think nothing of it as you process the driver.

After your shift ends early that morning you are talking with your partner. She doesn’t believe that the breathalyzer tests are anywhere near accurate. You ask her why and she tells you some stats: it is true that the breathalyzer always detects a truly drunk person, but only 1 in 1000 drivers are typically drunk. What’s more, the probability of the test being positive is only 8%. You shake your head while you try to put together what she has said. You’ve never questioned the test before but now you are.

What should you believe? Let’s pull this scenario apart.

To clarify once more, Bayes’ Theorem helps us update a hypothesis based on new evidence. We won’t restate this for the remaining scenarios, but thought it would be helpful to point it out one last time.

In this breathalyzer scenario, your hypothesis is that the person is drunk and your evidence is a positive breathalyzer test. Originally, you never questioned the accuracy of the test. You thought it was accurate since it has a 100% success rate of always detecting a truly drunk person. But after listening to your partner you are now questioning this belief.

As in our flu example above, this is where Bayes’ Theorem comes in and helps us have a better understanding of probability. In this scenario we are given two additional pieces of information that can help us come to a more precise probability. Let’s review those quickly before we move on.

- We know that 100% of the time the breathalyzer test will give a positive result for a truly drunk driver. By truly drunk we mean that a blood test would confirm that the person is over the blood alcohol limit.
- We know that 1 in 1000 drivers drives drunk, so the probability of any driver being drunk is 0.1%* (we calculated this by dividing 1/1000 and then multiplying by 100).
- We also know that the breathalyzer test will give a positive result 8% of the time regardless if it is accurate or not.

To start, we always need to determine what we are wanting to find. We want to know the probability of someone actually being drunk given that the breathalyzer test is positive.

Perfect. We now know what we are solving for, so let’s move on and tackle it in a few ways.

We are going to look at this problem through two different lenses. A) Visualize the Problem: we will visualize the problem by using a Venn diagram. B) Plugging Into Bayes’ formula and Solving: we will solve the problem by plugging our numbers into the Bayes’ Theorem formula.

Let’s visualize with a Venn diagram.

**Circle #1:** The area inside this circle represents all possible outcomes. In this scenario, it represents all people who could be drunk while driving. The small circle labeled “A” represents the .1% of drivers who actually are drunk. “A” is an event, and its probability is .1%. This probability is represented in our formula as P(A).

**Circle #2:** The area inside this circle represents all possible outcomes. In this scenario, it represents all possibilities for the breathalyzer test. The small circle labeled “B” represents the 8% of the tests that are positive.“B” is an event, and its probability is 8%. This probability is represented in our formula as P(B).

**Circle #3:** This is where all the pieces come together. In this circle we have combined both events “A” and “B”.

Here is how the entire visual can be understood:

- The white area inside this circle represents people who are not drunk drivers and breathalyzer tests that are negative.
- The area where only Circle A covers shows us people who are drunk while driving.
- The area where only Circle B covers shows us the total amount of breathalyzer tests that are positive.
- Boom! Take a look at the dark area where the two circles overlap. This is what we are really interested in! This is our question that we want to be answered, but in visual form. We want to know the probability P(A|B) of a driver being drunk given that the breathalyzer test is positive. This probability is found where both events occur together and is called an intersection.

With both circles now merged, we can visually see our question and what we are trying to solve for. Although we won’t be solving the question with a Venn diagram, the diagram does help us visualize what we are trying to understand. If P(A) is the probability a driver driving drunk, and P(B) is the probability a breathalyzer test being positive, what is the probability of both?

Let’s follow our four steps again. To make things clear, we’ll clarify what we want to find.

**Step 1:** Determine what you want to find. We want to know the probability of someone being truly drunk given that the breathalyzer test is positive.

**Step 2:** Write the above as a formula. Let’s translate what we are solving for into the formula. In other words, we’ll bring the language of Step #1 above into the formula.

Here is Bayes’ formula:

Now, let’s translate with what we are solving for.

**Step 3:** Find each ingredient and label it. From the scenario we know the following:

- P(A) – In our formula, this ingredient is represented as P(Drunk) and answers the question: What is the probability of a driver being drunk? This number is .001
- P(B|A) – In our formula, this ingredient is represented as P(Positive|Drunk). This number is 1, not .1, but 1 as in 100%.
- P(B) – In our formula, this ingredient is represented as P(Positive) and answers the question: What is the probability of a breathalyzer test being positive? This number is .08

**Step 4:** Plug each ingredient into the formula and solve.

**Conclusion:** So, after all the work and plugging each ingredient into the formula our answer is 1.25%. We can conclude from this that the probability of a driver having a positive test and actually being drunk is 1.25%. In other words, for every 1000 drivers being tested we have the following:

- We have 1 truly drunk driver who tested positive with the breathalyzer.
- We have roughly 79 other drivers who also tested positive with the breathalyzer but are not drunk.
- In total we have 80 drivers testing positive, and the probability of one of them testing positive and truly being drunk is around 1.25%.

That’s an eye opener! Remember what Bayes’ Theorem does: it can help us quantify skepticism and enable us to have a clearer understanding. Originally, we thought the probability of the driver being drunk was quite high, but now we see it is only around 1.25%.

In this scenario, the police officer would now take a very different view of breathalyzer tests and would likely be much more skeptical of their accuracy. *As a side note, there are other complexities to situations like this, such as the test actually being random, etc. Did the police officer pull the vehicle over because of how the driver was driving? Did the officer provide a breathalyzer test because if how they were acting? Responding? New information such as this would impact the entire equation. For sake of ease and teaching we have kept the problem very basic.

**A visual guide to Bayesian thinking – Julia Galef****Bayes’ Theorem – Explained Like You’re Five – Dana Scheider****Understanding Bayes’ Theorem**

- PSU.edu Lesson 6: Bayes’ Theorem
- Cornell.edu Bayes Theorem in Machine Learning
- Stanford.edu Conditional Probabilities and Bayes’ Theorem
- BetterExplained.com A Short Explanation of Bayes’ Theorem
- Khanacademy.org Bayes Theorem Visualized
- MathisFun.com Bayes’ Theorem

- Bayes Theorem: A Visual Introduction For Beginners – Dan Morris
- Think Bayes: Bayesian Statistics in Python – Allen B. Downey
- The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
- Bayesian Data Analysis, Third Edition