Understanding Naive Bayes: A Simple and Effective Classification Algorithm
Table Of Content
By CamelEdge
Updated on Fri Jul 26 2024
Introduction
Naive Bayes is one of the simplest yet most effective classification algorithms in machine learning. It is particularly useful for tasks involving a large set of features and where the assumption of feature independence holds true. Despite its simplicity, Naive Bayes often performs surprisingly well in various real-world applications.
The Naive Bayes Model
At its core, Naive Bayes relies on Bayes' theorem, which relates the conditional and marginal probabilities of random events. A simple Naive Bayes Network typically includes one hidden variable (the class label we want to predict) and multiple observed variables (the features). . This can be applied to various classification problems, such as:
- Spam Classification: Given an email, predict whether it is spam or not.
- Medical Diagnosis: Given a list of symptoms, predict whether a patient has a specific disease.
- Weather Prediction: Based on temperature, humidity, etc., predict if it will rain tomorrow.
Naive Bayes Assumptions
Naive Bayes makes a strong independence assumption: the observed features are independent of each other given the class label . This simplifies the computation significantly and is mathematically expressed as:
where is a normalization constant.
Learning and Inference
To perform classification using Naive Bayes, we need to:
-
Learn the Prior and Likelihood: Using the training data, we estimate the prior probabilities and the likelihood probabilities .
-
Compute the Posterior: For a new observation, we compute the posterior probability using the learned prior and likelihood.
Example
Consider the following training dataset with three Boolean features and a Boolean classification variable :
The objective is to compute the class of a new observation i.e find the posterior probability .
T | T | T | T |
T | F | T | F |
T | F | F | F |
F | T | T | F |
F | F | F | T |
We will start by learning the Bayes net using the training data. The table on the right shows 5 training examples.
- Learning the Bayesian Network
-
Estimating priors , since there are 2 examples out of 5 where
-
Estimate Likelihoods:** The likelihoods are calculated by counting the occurrence of each observation given the class ( Y ). For example, to compute ( P(X_1 = T \mid Y = T) ), we first count the number of training examples where ( Y = T ). Then, among these examples, we count how many have ( X_1 = T ). This gives us ( P(X_1 = T \mid Y = T) = \frac12 ).
T | T | 1/2 |
T | F | 1/2 |
F | T | 2/3 |
F | F | 1/3 |
T | T | 1/2 |
T | F | 1/2 |
F | T | 1/3 |
F | F | 2/3 |
T | T | 1/2 |
T | F | 1/2 |
F | T | 2/3 |
F | F | 1/3 |
- Inference: To compute the class of a new observation i.e find the posterior probability , we use the Naive bayes chain rule:
This gives us:
After normalization, we get: