Demystifying Hidden Markov Models: A Beginner's Guide to AI Algorithms

Imagine being lost in a building without a map or clear directions—how would you figure out your location? In robotics, this problem is known as localization. But how can we use artificial intelligence to estimate your position in such an environment?

Think about how humans solve this: we rely on observations (like landmarks or room layouts) and logical reasoning to make educated guesses about our surroundings. In this blog, we’ll explore how machines can emulate this process using a powerful algorithm called the Hidden Markov Model (HMM).

We’ll break down the fundamentals of HMMs step by step and show how they can solve problems like localization. Before we start, it will help if you brush up your knowledge of Bayesian networks.

Let’s dive in!

Markov Model

To begin, let’s model the location as a random variable $X_t$ . The subscript $t$ represents time, indicating that $X_t$ changes as time evolves. In this context, $X_t$ is referred to as the state variable, as it encapsulates the system’s state that we aim to observe.

For example, if you are in a building, $X_t$ might represent the room you are in at time $t$ . However, the true challenge lies in that we don’t directly observe $X_t$ ; instead, we rely on clues or observations that are related to the state. To model the system effectively, we make two key assumptions:

illustration of markov model — Markov Model

Markov Assumption

The Markov assumption simplifies the complexity of the problem by stating that the current state ( X_t ) depends only on the previous state $X_{t-1}$ . This means the entire history of states before $t-1$ is irrelevant for predicting $X_t$ . Mathematically,

P(X_t | X_{t-1}, X_{t-2}, \dots, X_0) = P(X_t | X_{t-1})

This assumption is powerful because it reduces the problem's dimensionality, making computations more efficient. It also reflects real-world systems where the most recent state often has the most influence on the next state.

Stationary Assumption

The stationary assumption states that the transition probabilities between states do not change over time. For instance, the probability of transitioning from one room to another in a building is assumed to remain constant regardless of the time step. Formally,

P(X_1 | X_{0}) = P(X_{2} | X_{1}) = P(X_{t} | X_{t-1})

This assumption simplifies the model further, as we only need to estimate a fixed set of probabilities, rather than tracking changes over time. $P(X_{t} | X_{t-1})$ is called the transition model as it governs how the system state variable evolves.

These two assumptions—Markov and stationary—form the foundation of the Markov model, allowing us to describe complex systems with a manageable level of detail.

Hidden Markov Model

So far, we’ve discussed the Markov model, where we focus on the transitions between states $X_t$ . However, in many real-world problems, we cannot directly observe the states—they are hidden. Instead, we rely on evidence that provides indirect clues about the current state.

For example, imagine a robot navigating a building. Its exact location (state) is hidden from us. Instead, we have access to data from sensors, such as:

A proximity sensor that tells whether the robot is near or far from a wall.
A camera detecting certain visual landmarks.
An accelerometer providing movement data.

These sensor readings form the evidence, denoted by $E_t$ , where $t$ represents time. The key challenge is to estimate the hidden states $X_t$ (e.g., the robot’s location) from the sequence of evidence $E_t$ .

How an HMM Models the Problem

An HMM consists of two main components:

State Transition Model: Describes the probability of transitioning from one state to another. This is the same as in the Markov model, represented by $P(X_t | X_{t-1})$ .
Observation/sensor Model: Defines the probability of observing $E_t$ given the current state $X_t$ . Formally, $P(E_t | X_t)$ .

Both the state transition model and the observation model adhere to the stationary assumption, meaning their probabilities remain constant over time.

The relationship between states and evidence can be illustrated as follows:

illustration of HMM — Hidden Markov Model

Here’s what happens:

The hidden states $X_t$ evolve over time according to the Markov property.
Each state $X_t$ generates an observation $E_t$ , which gives us a clue about the underlying state.

Why Use an HMM?

The power of HMMs lies in their ability to infer the most probable sequence of hidden states from the evidence, even when the observations are noisy or incomplete. For example:

In robotics, we can use HMMs to estimate the robot’s location from sensor data.
In speech recognition, HMMs are used to identify spoken words based on audio signals.

Joint Distribution in an HMM

In a Hidden Markov Model, the joint distribution encapsulates the relationship between the sequence of hidden states $X_1, X_2, \dots, X_t$ and the sequence of observations $E_1, E_2, \dots, E_t$ . It combines the state transition model and the observation/sensor model into a single probabilistic framework.

The joint probability of a sequence of states and observations in an HMM is given by:

P(X_0, X_1, X_2, \dots, X_t, E_1, E_2, \dots, E_T)=P(X_{0:T}, E_{1:T}),

where $0:t$ mean all variable from $t=0$ to $t=T$ . Using the chain rule of probability, we can decompose this joint probability as:

P(X_{0:T}, E_{1:T}) = P(X_0) \prod_{t=1}^T P(X_t | X_{t-1}) P(E_t | X_t)

Here’s what each term represents:

Initial state distribution, also called Prior distribution $P(X_0)$ : The probability of starting in a particular state.
State transition probabilities $P(X_t | X_{t-1})$ : Captures how states evolve over time, adhering to the Markov assumption.
Observation probabilities $P(E_t | X_t)$ : Encodes the likelihood of making a specific observation given the current state.

Inference in HMM: Filtering and Prediction

The central goal of inference in a Hidden Markov Model is to estimate information about the hidden states based on observations. Two key types of inference are filtering and prediction:

Filtering: Filtering is the task of computing the posterior distribution of the current state $X_t$ given all observations up to time $t$ :

P(X_t | e_{1:t})

This process helps us estimate the most likely state the system is in at the current time, given the evidence so far.

Prediction: Prediction focuses on estimating the future state $X_{t+1}$ given all observations up to the current time $t$ :

P(X_{t+1} | e_{1:t})

Filtering

Let’s derive the filtering equation by starting with:

P(X_t | e_{1:t}) = P(X_t | e_{1:t-1}, e_t)

where, we’ve simply rewritten the expression by splitting the evidence sequence into the most recent observation $e_t$ and the prior sequence.

Using Bayes’ Rule, we rewrite this probability as:

P(X_t | e_{1:t}) = \alpha P(e_t | X_t, e_{1:t-1}) P(X_t | e_{1:t-1})

Here, $\alpha$ is a normalization constant to ensure the probabilities sum to 1.

The Markov assumption states that the observation $e_t$ depends only on the current state $X_t$ . This simplifies the equation to:

P(X_t | e_{1:t}) = \alpha P(e_t | X_t) P(X_t | e_{1:t-1})

The first term is the sensor model, which captures the likelihood of the current observation $e_t$ given the state $X_t$ . The second term reflects the prediction of $X_t$ based on all past observations $e_{{ 1:t-1}}$ . Thus, filtering at time $t$ involves combining the sensor model with the prediction by multiplying them.

Now, let’s derive the equation for the prediction step.

Prediction

The goal of the prediction step is to compute $P(X_t | e_{{1:t-1}})$ , the probability of the current state $X_t$ based on all past observations $e_{{1:t-1}}$ .

We start with the definition:

P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t, X_{t-1} | e_{1:t-1})

This equation sums over all possible previous states $X_{t-1}$ , capturing the influence of past state on the current state.

Using the product rule, we can rewrite the joint probability $P(X_t, X_{t-1} | e_{{1:t-1}})$ :

P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t | X_{t-1}, e_{1:t-1}) P(X_{t-1} | e_{1:t-1})

The Markov assumption simplifies $P(X_t | X_{t-1}, e_{{1:t-1}})$ by stating that $X_t$ depends only on the immediate previous state $X_{t-1}$ :

P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t | X_{t-1}) P(X_{t-1} | e_{1:t-1})

The first term represents the transition model, describing the probability of transitioning from a previous state $X_{t-1}$ to the current state $X_t$ . The second term corresponds to the filtering estimate of $X_{t-1}$ , which captures our belief about the previous state based on past observations.

Combining the filtering and prediction steps gives us:

Here, the sensor model $P(e_t | X_t)$ incorporates the latest evidence, while the prediction term propagates the previous belief forward in time. Together, they update the belief for the current state $X_t$ .

Forward Algorithm: The filtering problem $P(X_t | e_{{1:t}})$ can therefore be solved using the following recursive approach:

Prediction Step: Use the state transition model to predict the distribution of the next state:

$P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t | X_{t-1}) P(X_{t-1} | e_{1:t-1})$

Update Step: Incorporate the new evidence $e_t$ using the observation model:

$P(X_t | e_{1:t}) = \alpha P(e_t | X_t) P(X_t | e_{1:t-1})$

FAQs

What is a Hidden Markov Model (HMM)?
An HMM is a probabilistic model used to describe systems where the true states are hidden but can be inferred through observable evidence. It consists of a state transition model and an observation model.
What is the difference between a Markov Model and a Hidden Markov Model?
A Markov Model focuses on directly observable states and their transitions, while an HMM deals with hidden states that are inferred using observed evidence.
Why is the Markov Assumption important in HMMs?
The Markov Assumption simplifies computations by assuming that the current state depends only on the previous state, reducing the complexity of the model.
What are the key components of an HMM?
- State Transition Model: Describes probabilities of moving from one state to another.
- Observation Model: Defines the probability of observing evidence given the current state.
- Initial State Distribution: The probability distribution of the starting state.
What are some real-world applications of HMMs?
HMMs are used in robotics (localization), speech recognition, bioinformatics (gene sequence analysis), and finance (market trend predictions).
How do filtering and prediction work in HMMs?
- Filtering: Estimates the current state based on all observed evidence up to the present.
- Prediction: Estimates future states based on current evidence.
What is the stationary assumption in HMMs?
The stationary assumption states that the probabilities governing state transitions and observations remain constant over time.
How does the observation model in an HMM work?
The observation model calculates the probability of seeing a particular piece of evidence given the current state. This is crucial for inferring hidden states.
What is the joint probability distribution in an HMM?
It represents the combined probability of a sequence of hidden states and observations, which is used to model the overall system behavior.
What makes HMMs robust in noisy environments?
HMMs can infer the most probable sequence of hidden states even when observations are noisy or incomplete, thanks to their probabilistic framework combining state transitions and observation models.

Test Your Knowledge

1/6

In the Markov Model, what does the Markov assumption state about the current state ( X_t )?