CamelEdge
ai

Demystifying Hidden Markov Models: A Beginner's Guide to AI Algorithms

cat
Table Of Content

    By CamelEdge

    Updated on Mon Nov 18 2024


    Imagine being lost in a building without a map or clear directions—how would you figure out your location? In robotics, this problem is known as localization. But how can we use artificial intelligence to estimate your position in such an environment?

    Think about how humans solve this: we rely on observations (like landmarks or room layouts) and logical reasoning to make educated guesses about our surroundings. In this blog, we’ll explore how machines can emulate this process using a powerful algorithm called the Hidden Markov Model (HMM).

    We’ll break down the fundamentals of HMMs step by step and show how they can solve problems like localization. Before we start, it will help if you brush up your knowledge of Bayesian networks.

    Let’s dive in!

    Markov Model

    To begin, let’s model the location as a random variable XtX_t. The subscript tt represents time, indicating that XtX_t changes as time evolves. In this context, XtX_t is referred to as the state variable, as it encapsulates the system’s state that we aim to observe.

    For example, if you are in a building, XtX_t might represent the room you are in at time tt. However, the true challenge lies in that we don’t directly observe XtX_t; instead, we rely on clues or observations that are related to the state. To model the system effectively, we make two key assumptions:

    illustration of markov model
    Markov Model

    Markov Assumption

    The Markov assumption simplifies the complexity of the problem by stating that the current state ( X_t ) depends only on the previous state Xt1X_{t-1}. This means the entire history of states before t1t-1 is irrelevant for predicting XtX_t. Mathematically,

    P(XtXt1,Xt2,,X0)=P(XtXt1)P(X_t | X_{t-1}, X_{t-2}, \dots, X_0) = P(X_t | X_{t-1})

    This assumption is powerful because it reduces the problem's dimensionality, making computations more efficient. It also reflects real-world systems where the most recent state often has the most influence on the next state.

    Stationary Assumption

    The stationary assumption states that the transition probabilities between states do not change over time. For instance, the probability of transitioning from one room to another in a building is assumed to remain constant regardless of the time step. Formally,

    P(X1X0)=P(X2X1)=P(XtXt1)P(X_1 | X_{0}) = P(X_{2} | X_{1}) = P(X_{t} | X_{t-1})

    This assumption simplifies the model further, as we only need to estimate a fixed set of probabilities, rather than tracking changes over time. P(XtXt1)P(X_{t} | X_{t-1}) is called the transition model as it governs how the system state variable evolves.

    These two assumptions—Markov and stationary—form the foundation of the Markov model, allowing us to describe complex systems with a manageable level of detail.

    Hidden Markov Model

    So far, we’ve discussed the Markov model, where we focus on the transitions between states XtX_t. However, in many real-world problems, we cannot directly observe the states—they are hidden. Instead, we rely on evidence that provides indirect clues about the current state.

    For example, imagine a robot navigating a building. Its exact location (state) is hidden from us. Instead, we have access to data from sensors, such as:

    • A proximity sensor that tells whether the robot is near or far from a wall.
    • A camera detecting certain visual landmarks.
    • An accelerometer providing movement data.

    These sensor readings form the evidence, denoted by EtE_t, where tt represents time. The key challenge is to estimate the hidden states XtX_t (e.g., the robot’s location) from the sequence of evidence EtE_t.

    How an HMM Models the Problem

    An HMM consists of two main components:

    1. State Transition Model: Describes the probability of transitioning from one state to another. This is the same as in the Markov model, represented by P(XtXt1)P(X_t | X_{t-1}).
    2. Observation/sensor Model: Defines the probability of observing EtE_t given the current state XtX_t. Formally, P(EtXt)P(E_t | X_t).

    Both the state transition model and the observation model adhere to the stationary assumption, meaning their probabilities remain constant over time.

    The relationship between states and evidence can be illustrated as follows:

    illustration of HMM
    Hidden Markov Model

    Here’s what happens:

    • The hidden states XtX_t evolve over time according to the Markov property.
    • Each state XtX_t generates an observation EtE_t, which gives us a clue about the underlying state.

    Why Use an HMM?

    The power of HMMs lies in their ability to infer the most probable sequence of hidden states from the evidence, even when the observations are noisy or incomplete. For example:

    • In robotics, we can use HMMs to estimate the robot’s location from sensor data.
    • In speech recognition, HMMs are used to identify spoken words based on audio signals.

    Joint Distribution in an HMM

    In a Hidden Markov Model, the joint distribution encapsulates the relationship between the sequence of hidden states X1,X2,,XtX_1, X_2, \dots, X_t and the sequence of observations E1,E2,,EtE_1, E_2, \dots, E_t. It combines the state transition model and the observation/sensor model into a single probabilistic framework.

    The joint probability of a sequence of states and observations in an HMM is given by:

    P(X0,X1,X2,,Xt,E1,E2,,ET)=P(X0:T,E1:T),P(X_0, X_1, X_2, \dots, X_t, E_1, E_2, \dots, E_T)=P(X_{0:T}, E_{1:T}),

    where 0:t0:t mean all variable from t=0t=0 to t=Tt=T. Using the chain rule of probability, we can decompose this joint probability as:

    P(X0:T,E1:T)=P(X0)t=1TP(XtXt1)P(EtXt)P(X_{0:T}, E_{1:T}) = P(X_0) \prod_{t=1}^T P(X_t | X_{t-1}) P(E_t | X_t)

    Here’s what each term represents:

    1. Initial state distribution, also called Prior distribution P(X0)P(X_0): The probability of starting in a particular state.
    2. State transition probabilities P(XtXt1)P(X_t | X_{t-1}): Captures how states evolve over time, adhering to the Markov assumption.
    3. Observation probabilities P(EtXt)P(E_t | X_t): Encodes the likelihood of making a specific observation given the current state.

    Inference in HMM: Filtering and Prediction

    The central goal of inference in a Hidden Markov Model is to estimate information about the hidden states based on observations. Two key types of inference are filtering and prediction:

    Filtering: Filtering is the task of computing the posterior distribution of the current state XtX_t given all observations up to time tt:

    P(Xte1:t)P(X_t | e_{1:t})

    This process helps us estimate the most likely state the system is in at the current time, given the evidence so far.

    Prediction: Prediction focuses on estimating the future state Xt+1X_{t+1} given all observations up to the current time tt:

    P(Xt+1e1:t)P(X_{t+1} | e_{1:t})

    Filtering

    Let’s derive the filtering equation by starting with:

    P(Xte1:t)=P(Xte1:t1,et)P(X_t | e_{1:t}) = P(X_t | e_{1:t-1}, e_t)

    where, we’ve simply rewritten the expression by splitting the evidence sequence into the most recent observation ete_t and the prior sequence.

    Using Bayes’ Rule, we rewrite this probability as:

    P(Xte1:t)=αP(etXt,e1:t1)P(Xte1:t1)P(X_t | e_{1:t}) = \alpha P(e_t | X_t, e_{1:t-1}) P(X_t | e_{1:t-1})

    Here, α\alpha is a normalization constant to ensure the probabilities sum to 1.

    The Markov assumption states that the observation ete_t depends only on the current state XtX_t. This simplifies the equation to:

    P(Xte1:t)=αP(etXt)P(Xte1:t1)P(X_t | e_{1:t}) = \alpha P(e_t | X_t) P(X_t | e_{1:t-1})

    The first term is the sensor model, which captures the likelihood of the current observation ete_t given the state XtX_t. The second term reflects the prediction of XtX_t based on all past observations e1:t1e_{{ 1:t-1}}. Thus, filtering at time tt involves combining the sensor model with the prediction by multiplying them.

    Now, let’s derive the equation for the prediction step.

    Prediction

    The goal of the prediction step is to compute P(Xte1:t1)P(X_t | e_{{1:t-1}}), the probability of the current state XtX_t based on all past observations e1:t1e_{{1:t-1}}.

    We start with the definition:

    P(Xte1:t1)=Xt1P(Xt,Xt1e1:t1)P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t, X_{t-1} | e_{1:t-1})

    This equation sums over all possible previous states Xt1X_{t-1}, capturing the influence of past state on the current state.

    Using the product rule, we can rewrite the joint probability P(Xt,Xt1e1:t1)P(X_t, X_{t-1} | e_{{1:t-1}}):

    P(Xte1:t1)=Xt1P(XtXt1,e1:t1)P(Xt1e1:t1)P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t | X_{t-1}, e_{1:t-1}) P(X_{t-1} | e_{1:t-1})

    The Markov assumption simplifies P(XtXt1,e1:t1)P(X_t | X_{t-1}, e_{{1:t-1}}) by stating that XtX_t depends only on the immediate previous state Xt1X_{t-1}:

    P(Xte1:t1)=Xt1P(XtXt1)P(Xt1e1:t1)P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t | X_{t-1}) P(X_{t-1} | e_{1:t-1})

    The first term represents the transition model, describing the probability of transitioning from a previous state Xt1X_{t-1} to the current state XtX_t. The second term corresponds to the filtering estimate of Xt1X_{t-1}, which captures our belief about the previous state based on past observations.

    Combining the filtering and prediction steps gives us:

    filtering equation

    Here, the sensor model P(etXt)P(e_t | X_t) incorporates the latest evidence, while the prediction term propagates the previous belief forward in time. Together, they update the belief for the current state XtX_t.

    The filtering problem P(Xte1:t)P(X_t | e_{{1:t}}) can therefore be solved using the following recursive approach:

    1. Prediction Step: Use the state transition model to predict the distribution of the next state:
    P(Xte1:t1)=Xt1P(XtXt1)P(Xt1e1:t1)P(X_t | e_{1:t-1}) = \sum_{X_{t-1}} P(X_t | X_{t-1}) P(X_{t-1} | e_{1:t-1})
    1. Update Step: Incorporate the new evidence ete_t using the observation model:
    P(Xte1:t)=αP(etXt)P(Xte1:t1)P(X_t | e_{1:t}) = \alpha P(e_t | X_t) P(X_t | e_{1:t-1})

    FAQs

    1. What is a Hidden Markov Model (HMM)?
      An HMM is a probabilistic model used to describe systems where the true states are hidden but can be inferred through observable evidence. It consists of a state transition model and an observation model.

    2. What is the difference between a Markov Model and a Hidden Markov Model?
      A Markov Model focuses on directly observable states and their transitions, while an HMM deals with hidden states that are inferred using observed evidence.

    3. Why is the Markov Assumption important in HMMs?
      The Markov Assumption simplifies computations by assuming that the current state depends only on the previous state, reducing the complexity of the model.

    4. What are the key components of an HMM?

      • State Transition Model: Describes probabilities of moving from one state to another.
      • Observation Model: Defines the probability of observing evidence given the current state.
      • Initial State Distribution: The probability distribution of the starting state.
    5. What are some real-world applications of HMMs?
      HMMs are used in robotics (localization), speech recognition, bioinformatics (gene sequence analysis), and finance (market trend predictions).

    6. How do filtering and prediction work in HMMs?

      • Filtering: Estimates the current state based on all observed evidence up to the present.
      • Prediction: Estimates future states based on current evidence.
    7. What is the stationary assumption in HMMs?
      The stationary assumption states that the probabilities governing state transitions and observations remain constant over time.

    8. How does the observation model in an HMM work?
      The observation model calculates the probability of seeing a particular piece of evidence given the current state. This is crucial for inferring hidden states.

    9. What is the joint probability distribution in an HMM?
      It represents the combined probability of a sequence of hidden states and observations, which is used to model the overall system behavior.

    10. What makes HMMs robust in noisy environments?
      HMMs can infer the most probable sequence of hidden states even when observations are noisy or incomplete, thanks to their probabilistic framework combining state transitions and observation models.


    Test Your Knowledge

    1/6

    In the Markov Model, what does the Markov assumption state about the current state ( X_t )?