
The Naive Bayes algorithm in machine learning, also known as the Naive Bayes classifier, is one of the simplest and most powerful classification algorithms out there. While it’s certainly not the most accurate method (generally speaking), it’s ideal for scenarios where you have very little data but still want to classify your new data into several discrete categories. It will even work with very small datasets of only 10 or 20 items!
Nave Bayes algorithm is a probabilistic model that uses the Bayes Theorem to perform classification problems. In this tutorial, I will explain Naive Bayes Classifier from scratch with Python by understanding the mathematical intuition behind it.
As it is a probabilistic classifier, which means it predicts based on an object’s likelihood. This algo is mostly used for large or high-dimensional datasets like text classification.
Application of Naive Bayes Model
This classification algorithm is mostly used for:
- Spam Filtering
- Recommendation System
- Sentiment Analysis
- Face Recognition
- News Classification
- Weather Prediction
- and lot more
Conditional Probability
Naive Bayes machine learning technique is based on Bayes theorem. But to understand Bayes principle you should first know about conditional probability.
Consider a coin-tossing exercise with two coins. The following will be the sample space:
Outcome = {HH, HT, TH, TT}
If a person is asked to calculate the likelihood of receiving a tail, his answer is 3/4 = 0.75.
Assume that now another person is doing the same experiment, but this time we specify that both coins must be heads {HH}. It means that if event A: ”Both the coins should have heads” occurred, then the simple outcomes {HT, TH, TT} could not have occurred. In this case, the likelihood of getting heads on both coins is 1/4 = 0.25.
We can see from the above example that the probability of the outcome may change if some additional information is given to us. This is exactly what we need to do when developing any machine learning model: calculate the output given certain characteristics or features.
Mathematically above statement can be written as below:

Bayes Rule
Now let’s understand one of the most important conditional probability rule which is Bayes Theorem.
The Bayes’ theorem, proposed by British mathematician Thomas Bayes in 1763, gives a method for determining the probability of an occurrence given specific information.
Mathematically Bayes’ theorem can be stated as:

The above equation is trying to find the likelihood or probability of event X if event Y is true.
- P(Y) refers to the likelihood of an occurrence occurring before the evidence.
- P(Y|X) is the posterior probability, which is the likelihood or probability of an event after the evidence (Y) is seen.
Here
- X: is the dependent variables (feature variables) in your dataset
- Y: is the class variable or output variable (for example MALE, FEMALE)
What is Naive Bayes Classifier in Machine Learning
Bayes rule tells us the formula for the probability of Y given one feature X. In real world data science problems, we hardly find any use case where there is only one feature variable.
To implement Bayes Theorem to data with multiple features Naive Bayes algorithm is introduced.
Naive Bayes is nothing but the extension of Bayes rule. When the features are independent, we can expand Bayes’ rule and call it Naive Bayes, which assumes that the features are independent, which means changing the value of one variable will not affect the values of other variables.
So if we have n numbers of feature variables (X1, X2, X3,…Xn) with the class variable as Y, we can write Naive Bayes Equation like below:

Now the denominator [P(X1)*P(X2)*}P(X3)] remains constant across all items in the dataset. So these values will not impact in the equation. As a result, the denominator can be deleted and proportionality can be inserted.
So now the final Naive Bayes equation can look like below:

Why is it known as Naive Bayes?
- Naive: In English, the term “Naive” refers to a person or behavior that has no experience, knowledge, or judgment. The same thing is happening in the Naive Bayes algorithm. It implies that the features are independent, which means that another variable has no information about changes in any variable, which is why this method is called “NAIVE”.
- Bayes: Because this algorithm is based on Bayes’ Theorem
Types of Naive Bayes Algo
There are mainly two types of Naive Bayes algorithm in machine learning which are:
- Standard Naive Bayes: This type of Naive Bayes only supports categorical variables or features. This type of Naive Bayes also called Multinomial Naive Bayes Classifier
- Gaussian Naive Bayes: This type of Naive Bayes algorithm only supports continuous valued features.
In this tutorial, I will explain Standard Naive Bayes.
Naive Bayes Algorithm Example from scratch
Let’s say we have a dataset to predict whether a person buys a computer or not.
In terms of our dataset, the algorithm’s assumptions can be summarised as follows:
- All features or variables are independent. That means if income is “high” that does not mean that credit rating will be “excellent”.
- All of the predictors should have the same impact on the result. That means income being “high” should not have more importance in predicting if a person will buy a computer or not.
Before applying Naive Bayes formula on the above dataset, we need to do some pre-calculations for our dataset.
In our example dataset we have only three feature variables and one output variable:
- Feature Variable
- X1: Income
- X2: Student
- X3: Credit Rating
- Output Variable
- Y: Buys computer
Since our dataset have three feature variables we can write Naive Bayes equation like below:

So Naive equation for our dataset should look like the below:

Now The denominator remains constant across all items in the dataset. As a result, the denominator can be deleted and proportionality inserted.
Now let’s calculate those individual probabilities one by one by using below frequency tables:

We also need to calculate the output probability P(Y) for the variable “Buys Computer”. This probability can be calculated as below:

In the above calculation, you can see that
- The total number of observations or rows in our dataset is 18
- Number of times output was “YES” was 11
- Number of times output was “No” was 7
Prediction in Naive Bayes algorithm
Now that we made a custom naive bayes equation for our dataset. Now let’s say we got test data to predict whether a person will buy a computer or not. For example test data can be looks like below:

We can easily predict that by putting those values into our custom-made equation of Naive Bayes classifier like below:
Probability of Buying a Computer

Probability of Not Buying a Computer

We already know that P(Yes|Test)+P(No|Test) = 1.
So we can normalize the result to range those from 0-100 like below:

We can observe that P(Yes|Test) > P(No|Test), so the prediction for our test data whether that person will buy a computer or not is “YES”.
Python implementation of Naive Bayes Classifier from scratch
In this section, I will show how you can use Naive Bayes machine learning model to classify an email is Spam or Non-Spam in Python. For this example, I am going to use E-Mail classification NLP dataset.
You can download the dataset from here. Load and check the train and test data when you’ve downloaded it.
Load Dataset
import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score
from tqdm.notebook import tqdm
import re
import string
import nltk
# For stemming in NLP python with NLTK
stop_words = set(nltk.corpus.stopwords.words('english'))
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
# For Lemmatization in NLP python with NLTK
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
# Load quora question classification dataset
train_data = pd.read_csv('data/SMS_train.csv', encoding= 'unicode_escape')
test_data = pd.read_csv('data/SMS_test.csv', encoding= 'unicode_escape')
# Join train and test dataframe to make single data
df_full = pd.concat([train_data, train_data], ignore_index=True)
# Visualize some data
df_full.head()

Now let’s check the shape of those two data.
print ('Shape of train Data: ',train_data.shape)
print ('Shape of test Data: ',test_data.shape)
print ('Shape of Full Data: ',df_full.shape)
Shape of train Data: (957, 3)
Shape of test Data: (125, 3)
Shape of Full Data: (1914, 3)
In our data Label, “Spam” means, a spam email, and Label “Non-Spam” means authentic mail. Now let’s see how are spam emails look like and how are good emails look like.
print ('Taking a look at Non Spam Emails')
print(df_full.loc[df_full['Label'] == 'Non-Spam'].sample(5)['Message_body'])
print('---------------')
print ('Taking a look at Smap Emails')
print(df_full.loc[df_full['Label'] == 'Spam'].sample(5)['Message_body'])
Taking a look at Non Spam Emails
789 Sorry completely forgot * will pop em round th...
1106 Hi elaine, is today's meeting confirmed?
1551 I'm fine. Hope you are good. Do take care.
783 Especially since i talk about boston all up in...
1209 K, fyi I'm back in my parents' place in south ...
Name: Message_body, dtype: object
---------------
Taking a look at Smap Emails
1679 YOU HAVE WON! As a valued Vodafone customer ou...
1571 Welcome to Select, an O2 service with added be...
1003 You are awarded a SiPix Digital Camera! call 0...
1465 Free video camera phones with Half Price line ...
1895 Todays Voda numbers ending with 7634 are selec...
Name: Message_body, dtype: object
Text Pre-Processing for Naive Bayes Algo
Step1: Text Cleaning
First step of any text analysis is to clean the data. Some common steps for text cleaning are removing numbers, removing punctuations, Stop words removal, Stemming of words, and Lemmatization of words. In the below code, we are just following these steps to clean text for our entire dataset.
# Removing Numbers and Punctuations from text in NLP with NLTK and Python. We use regex expressions to remove numbers.
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
clean_text = []
for row_num in tqdm(range(len(df_full))):
sentence = df_full.iloc[row_num]['Message_body']
# Convert to lower case
sentence = sentence.lower()
# Remove numbers from text using regex
sentence = re.sub(r'\d+','',sentence)
# Remove punctuations from text
sentence = sentence.translate(sentence.maketrans("","",string.punctuation))
# Text tokenization to make a list of words
word_list = w_tokenizer.tokenize(sentence)
# Remove stop words from string
words_in_sentence = list(set(sentence.split(' ')) - stop_words)
# Lemmatization of word
words = []
for i,word in enumerate(words_in_sentence):
words_in_sentence[i] = lemmatizer.lemmatize(word)
# Convert token list to sentence
sen = ' '.join(words_in_sentence)
clean_text.append(sen)
# Create Separate column for cleaned text
df_full['Cleaned_Message'] = clean_text
# Show some data
df_full.tail()

Step2: Vectorization of Text
Before we can proceed with model building, first we need to preprocess our text to transform it into a numeric form. Because machine learning models cannot understand text, the first step is to use Count Vectorization, a text preprocessing technique, to convert the text into a vector of numbers using the CountVectorizer
module in scikit-learn.
Note: You can also use word2vec word embedding to convert text into numeric form.
CountVectorizer
is frequently used in Natural Language Processing (NLP), both for model building and text analysis via n-grams (also called Q-grams or shingles). CountVectorizer
takes our input text and returns the number of times a word appears in the entire “corpus” (that is, full dataset of quora).
To use CountVectorizer, we first need to instantiate it, then we need to pass our input text column to the fit_transform() function to return a Bag of Words model. We’ll then convert that Bag of Words to a dense array using Numpy that can better be used by the model.
Note: There are several approaches for preprocessing text for machine learning models, and this is only one of them.
count_vector = CountVectorizer()
bow_out = count_vector.fit_transform(df_full['Cleaned_Message'])
bow_array = np.array(bow_out.todense())
bow_array
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64)
Construct a Naive Bayes Classifier from scratch in Python
Now that you are done with data processing and cleaned the data, we can now implement the Naive Bayes algorithm in machine learning in python by following below steps:
Prepare the training and testing data
Now, we’ll send our Bag of Words data to use it as the feature variables (X variables) for our model. We also need to assign “Label” variable as the target variable (y) that we want to predict.
The train test split() method will then be used to generate our train and test data. This function utilize stratification technique of statistics to gurantee that the proportions are distributed evenly throughout the dataset.
We will use 30% of the entire data for testing or validation and utilise stratification to guarantee that the proportions are distributed evenly throughout the datasets.
X = bow_array
y = df_full['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3,
stratify=y)
Build a Multinomial Naive Bayes classification model
Now that everything is in place, we’ll fit a Multinomial Naive Bayes classification model using MultinomialNB
module from scikit-learn library. We’ll feed our X train and y train data to the fit()
method to train the model and predict the email type from the vectors of the “Cleaned_Message” text.
The Multinomial Naive Bayes model is the standard type of Naive Bayes model which can be used to classify any data with discrete characteristics or categorical variables, such as our email classification problem.
model = MultinomialNB().fit(X_train, y_train)
Note: This is just a simple model, to implement in your real-world project you must try naive bayes hyperparameter tuning
Analyze model performance
Now that our model is trained and ready, let’s try to predict and evaluate our model for test data and check the accuracy of our model.
# Prediction for trained Naive Bayes model
y_pred = model.predict(X_test)
# Print accuracy for Naive Bayes model
print('Accuracy:', accuracy_score(y_test, y_pred))
print('F1 score:', f1_score(y_test, y_pred, average="macro"))
Accuracy: 0.9686956521739131
F1 score: 0.9346590909090908
To check some important evaluation parameters like precision, recall, f1 score, you can try classification_report()
function.
print(classification_report(y_test, y_pred))

Conclusion
Naive Bayes algorithm in machine learning is often used in sentiment analysis, spam filtering, recommendation systems, text classification and other applications. They are quick and simple to deploy. In this post, I explained Naive Bayes algo with one simple python implementation.

Hi there, I’m Anindya Naskar, Data Science Engineer. I created this website to show you what I believe is the best possible way to get your start in the field of Data Science.