Understanding Bayesian Networks: A Comprehensive Guide
Table Of Content
By CamelEdge
Updated on Thu Jul 25 2024
Introduction
A Bayesian network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). Each random variable encodes some aspect of the world for which we may have uncertainty. These random variables are usually denoted using capital letters such as:
- : Is it raining?
- : Heads or tails?
- : How long will it take to reach the front of the queue?
Random variables can be discrete or continuous. Each random variable is assigned a value from its domain. For example, and .
Probability Basics
We assign a probability value to each value a random variable takes. For example, we can assign a value of to heads and to tails. There are three types of probability values:
Marginal Probability
Marginal probability is the probability of a single event occurring without any consideration of other events. It is derived by summing or integrating over the possible values of other random variables.
Joint Probability
Joint probability is the probability of two or more events occurring simultaneously. It is the probability of the intersection of events, representing the combined outcome of these events.
Conditional Probability
Conditional probability is the probability of an event occurring given that another event has already occurred.
Probability Rules
probability rules allow us to establish relationships between different probabilities. For instance, we can relate marginal probabilities to joint probabilities 🎲
Law of Total Probability The law of total probability relates marginal and joint probabilities. It’s given by:
It states that the marginal probability of an event can be found by considering all possible ways that the event can occur jointly with the other variable.
Product Rule The product rule relates joint probability to conditional probability. It states that the joint probability of two variable and can be expressed as:
Similarly, the conditional probability can be expressed as:
Bayes' Rule The product rule allows us to compute the conditional probability from the joint probability. But what if the joint is not available. In that case Bayes' rule comes to the rescue. It allows us to compute the conditional probability from the inverse conditional . It is derived from the product rule and is expressed as:
Chain Rule The chain rule of probability allows us to express the joint probability of a sequence of events as the product of conditional probabilities. For a sequence of events , the chain rule is given by:
It is obtained by applying product rule over and over again.
Conditional Independence
Conditional independence is a fundamental concept in probability theory and Bayesian networks. Two random variables and are conditionally independent given a third random variable if the conditional probability distribution of given is the same regardless of the value of . Formally, and are conditionally independent given if:
or equivalently,
This implies that knowing the value of provides no additional information about once we know the value of .
A Simple Bayes' Net
Let's create our first Bayesian Network. Consider a simple example where we have three random variables: , , and . Assume represents having some disease, and and are two test results.
The joint distribution of these three variables is given by chain rule:
To simplify our model, we will make a conditional independence assumption that and are independent given . This means:
Using this assumption, we can express the joint distribution as:
The Bayesian network corresponding to this assumption is shown on the right.
Another Example
Now lets look at an example that consists of five random variables: Burglary (B), Earthquake (E), Alarm (A), John Calls (J), and Mary Calls (M). Each of these variables represents an event that can either happen or not happen, and their relationships can be described using conditional dependencies.
Structure of the Bayesian Network
The Bayesian network for this scenario can be structured as follows:
Burglary (B) and Earthquake (E): These two events are considered independent of each other. A burglary occurring does not influence the probability of an earthquake and vice versa.
Alarm (A): The alarm depends on both the Burglary and Earthquake events. If either a burglary or an earthquake occurs, it can trigger the alarm.
John Calls (J) and Mary Calls (M): John and Mary calling are dependent on whether the alarm has gone off. If the alarm rings, it increases the likelihood that John or Mary will call.
The joint distribution of these variables can be represented as . Using the chain rule of Bayesian networks (conditional independence assumptions implied in the Bayes Net structure), we can express this joint distribution in terms of the conditional probabilities:
In a general Bayes Net, the joint probabilities are expressed as:
where denotes the set of parent nodes of .
Benefits of Using Bayesian Networks
Bayesian networks offer significant advantages in probabilistic modeling and reasoning under uncertainty:
Reduction in Parameter Complexity: The full joint distribution table of a set of variables requires probability values, where is the number of domain values of the random variable and is the number of variables. In contrast, Bayesian networks significantly reduce the number of parameters required, depending on the network's structure and the implied conditional independence assumptions.
Example Illustration: Let's break down the parameters in the example Bayesian network above:
-
Burglary (B): One parameter representing the probability of a burglary occurring. The probability of a burglary not occurring can be obtained from as follows: ."
-
Earthquake (E): One parameter representing the probability of an earthquake occurring.
-
Alarm (A): requires 4 parameters because can depend on both and . The other 4 probabilities can be obtained using these 4.
-
John Calls (J): In the example, is described by 2 parameters, i.e and . The other two probabilities can be obtained as follows: and
-
Mary Calls (M): Similarly, requires 2 parameters.
Therefore, the total number of parameters in this Bayesian network example is . Each parameter captures a specific probability value that reflects the likelihood of events occurring based on the network structure and conditional dependencies. This structured approach reduces the complexity compared to a full joint probability table (which requires ), making Bayesian networks more efficient and effective for probabilistic modeling and inference tasks.
Conditional Probability Tables (CPTs)
Let's assume that the we are provided the conditional probabilities of all variables in the Bayesian network example above. These tables represent the conditional probabilities for each variable in the Bayesian network example. Each table specifies the probability of the variable given its parent variables' states, adhering to the network structure and the provided parameters.
true | 0.001 |
false | 0.999 |
true | 0.002 |
false | 0.998 |
true | true | 0.95 |
true | false | 0.94 |
false | true | 0.29 |
false | false | 0.001 |
true | 0.90 |
false | 0.05 |
true | 0.70 |
false | 0.01 |
Inference by enumeration
To find the posterior probability of e.g burglary given that John and Mary have called, , we can proceed as follows:
-
Apply the Product Rule:
-
Normalization: Since can be difficult to compute directly, we use normalization trick:
where is a normalization constant, which will be computed later.
-
Use the Law of Total Probability: We expand using the law of total probability over all possible values of other variables in the network:
-
Apply the Bayesian Chain Rule: Next, we use the Bayesian chain rule to decompose into a product of conditional probabilities. Given the structure of the Bayesian network, we have:
Putting it all together:
By summing over all possible values of (earthquake) and (alarm), and then normalizing the result, we can compute , the posterior probability of burglary given that John and Mary have called. The normalization constant ensures that the total probability sums to 1.
Now, we can compute this using the given values. We'll calculate for and separately.
Normalize
Now we need to normalize these probabilities to find . :
Substitute the values:
Finally, compute for both cases:
Thus, the posterior probability of a burglary given that John and Mary have called is: