Naive Bayes algorithm in Machine Learning with Python

naive bayes algorithm in machine learning python implementation

The Naive Bayes algorithm in machine learning, also known as the Naive Bayes classifier, is one of the simplest and most powerful classification algorithms out there. While it’s certainly not the most accurate method (generally speaking), it’s ideal for scenarios where you have very little data but still want to classify your new data into several discrete categories. It will even work with very small datasets of only 10 or 20 items!

Nave Bayes algorithm is a probabilistic model that uses the Bayes Theorem to perform classification problems. In this tutorial, I will explain Naive Bayes Classifier from scratch with Python by understanding the mathematical intuition behind it.

As it is a probabilistic classifier, which means it predicts based on an object’s likelihood. This algo is mostly used for large or high-dimensional datasets like text classification.

Application of Naive Bayes Model

This classification algorithm is mostly used for:

Spam Filtering
Recommendation System
Sentiment Analysis
Face Recognition
News Classification
Weather Prediction
and lot more

Conditional Probability

Naive Bayes machine learning technique is based on Bayes theorem. But to understand Bayes principle you should first know about conditional probability.

Consider a coin-tossing exercise with two coins. The following will be the sample space:

Outcome = {HH, HT, TH, TT}

If a person is asked to calculate the likelihood of receiving a tail, his answer is 3/4 = 0.75.

Assume that now another person is doing the same experiment, but this time we specify that both coins must be heads {HH}. It means that if event A: ”Both the coins should have heads” occurred, then the simple outcomes {HT, TH, TT} could not have occurred. In this case, the likelihood of getting heads on both coins is 1/4 = 0.25.

We can see from the above example that the probability of the outcome may change if some additional information is given to us. This is exactly what we need to do when developing any machine learning model: calculate the output given certain characteristics or features.

Mathematically above statement can be written as below:

conditional probability of naive bayes equation

Bayes Rule

Now let’s understand one of the most important conditional probability rule which is Bayes Theorem.

The Bayes’ theorem, proposed by British mathematician Thomas Bayes in 1763, gives a method for determining the probability of an occurrence given specific information.

Mathematically Bayes’ theorem can be stated as:

The above equation is trying to find the likelihood or probability of event X if event Y is true.

P(Y) refers to the likelihood of an occurrence occurring before the evidence.
P(Y|X) is the posterior probability, which is the likelihood or probability of an event after the evidence (Y) is seen.

Here

X: is the dependent variables (feature variables) in your dataset
Y: is the class variable or output variable (for example MALE, FEMALE)

What is Naive Bayes Classifier in Machine Learning

Bayes rule tells us the formula for the probability of Y given one feature X. In real world data science problems, we hardly find any use case where there is only one feature variable.

To implement Bayes Theorem to data with multiple features Naive Bayes algorithm is introduced.

Naive Bayes is nothing but the extension of Bayes rule. When the features are independent, we can expand Bayes’ rule and call it Naive Bayes, which assumes that the features are independent, which means changing the value of one variable will not affect the values of other variables.

So if we have n numbers of feature variables (X₁, X₂, X₃,…X_n) with the class variable as Y, we can write Naive Bayes Equation like below:

Now the denominator [P(X1)*P(X2)*}P(X3)] remains constant across all items in the dataset. So these values will not impact in the equation. As a result, the denominator can be deleted and proportionality can be inserted.

Also Read: Install MiniConda & setup Environment for Machine Learning

So now the final Naive Bayes equation can look like below:

Why is it known as Naive Bayes?

Naive: In English, the term “Naive” refers to a person or behavior that has no experience, knowledge, or judgment. The same thing is happening in the Naive Bayes algorithm. It implies that the features are independent, which means that another variable has no information about changes in any variable, which is why this method is called “NAIVE”.
Bayes: Because this algorithm is based on Bayes’ Theorem

Types of Naive Bayes Algo

There are mainly two types of Naive Bayes algorithm in machine learning which are:

Standard Naive Bayes: This type of Naive Bayes only supports categorical variables or features. This type of Naive Bayes also called Multinomial Naive Bayes Classifier
Gaussian Naive Bayes: This type of Naive Bayes algorithm only supports continuous valued features.

In this tutorial, I will explain Standard Naive Bayes.

Naive Bayes Algorithm Example from scratch

Let’s say we have a dataset to predict whether a person buys a computer or not.

Income	Student	Credit Rating	Buys computer
high	no	fair	YES
medium	no	fair	NO
medium	no	excellent	NO
high	yes	fair	YES
high	no	fair	NO
medium	yes	fair	YES
low	yes	excellent	YES
low	yes	fair	YES
high	no	excellent	NO
medium	no	fair	YES
low	yes	fair	NO
low	yes	fair	NO
low	yes	fair	NO
high	yes	fair	YES
high	no	excellent	YES
high	no	excellent	YES
high	no	excellent	YES
medium	no	excellent	YES

In terms of our dataset, the algorithm’s assumptions can be summarised as follows:

All features or variables are independent. That means if income is “high” that does not mean that credit rating will be “excellent”.
All of the predictors should have the same impact on the result. That means income being “high” should not have more importance in predicting if a person will buy a computer or not.

Before applying Naive Bayes formula on the above dataset, we need to do some pre-calculations for our dataset.

In our example dataset we have only three feature variables and one output variable:

Feature Variable
1. X₁: Income
2. X₂: Student
3. X₃: Credit Rating

Output Variable
1. Y: Buys computer

Since our dataset have three feature variables we can write Naive Bayes equation like below:

naive bayes algoritnm example formula for our dataset

So Naive equation for our dataset should look like the below:

naive bayes machine learning formula for our dataset

Now The denominator remains constant across all items in the dataset. As a result, the denominator can be deleted and proportionality inserted.

Now let’s calculate those individual probabilities one by one by using below frequency tables:

calculate naive bayes probability by hand in excel

We also need to calculate the output probability P(Y) for the variable “Buys Computer”. This probability can be calculated as below:

output variable probability calculation for naive bayes

In the above calculation, you can see that

The total number of observations or rows in our dataset is 18
Number of times output was “YES” was 11
Number of times output was “No” was 7

Prediction in Naive Bayes algorithm

Now that we made a custom naive bayes equation for our dataset. Now let’s say we got test data to predict whether a person will buy a computer or not. For example test data can be looks like below:

We can easily predict that by putting those values into our custom-made equation of Naive Bayes classifier like below:

Probability of Buying a Computer

probability of yes prediction naive bayes theory

Probability of Not Buying a Computer

probability of no prediction naive bayes theorem

We already know that P(Yes|Test)+P(No|Test) = 1.

So we can normalize the result to range those from 0-100 like below:

calculate output probability of naive bayes theory in machine learning algorithm

We can observe that P(Yes|Test) > P(No|Test), so the prediction for our test data whether that person will buy a computer or not is “YES”.

Also Read: What is Reinforcement Learning in Machine Learning

Python implementation of Naive Bayes Classifier from scratch

In this section, I will show how you can use Naive Bayes machine learning model to classify an email is Spam or Non-Spam in Python. For this example, I am going to use E-Mail classification NLP dataset.

You can download the dataset from here. Load and check the train and test data when you’ve downloaded it.

Load Dataset

import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score

from tqdm.notebook import tqdm
import re
import string
import nltk

# For stemming in NLP python with NLTK
stop_words = set(nltk.corpus.stopwords.words('english'))
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

# For Lemmatization in NLP python with NLTK
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

# Load quora question classification dataset
train_data = pd.read_csv('data/SMS_train.csv', encoding= 'unicode_escape')
test_data = pd.read_csv('data/SMS_test.csv', encoding= 'unicode_escape')

# Join train and test dataframe to make single data
df_full = pd.concat([train_data, train_data], ignore_index=True)

# Visualize some data
df_full.head()

spam filtering in NLP data read in python

Now let’s check the shape of those two data.

print ('Shape of train Data: ',train_data.shape)
print ('Shape of test Data: ',test_data.shape)
print ('Shape of Full Data: ',df_full.shape)

Shape of train Data:  (957, 3)
Shape of test Data:  (125, 3)
Shape of Full Data:  (1914, 3)

In our data Label, “Spam” means, a spam email, and Label “Non-Spam” means authentic mail. Now let’s see how are spam emails look like and how are good emails look like.

print ('Taking a look at Non Spam Emails')
print(df_full.loc[df_full['Label'] == 'Non-Spam'].sample(5)['Message_body'])

print('---------------')

print ('Taking a look at Smap Emails')
print(df_full.loc[df_full['Label'] == 'Spam'].sample(5)['Message_body'])

Taking a look at Non Spam Emails
789     Sorry completely forgot * will pop em round th...
1106             Hi elaine, is today's meeting confirmed?
1551           I'm fine. Hope you are good. Do take care.
783     Especially since i talk about boston all up in...
1209    K, fyi I'm back in my parents' place in south ...
Name: Message_body, dtype: object
---------------
Taking a look at Smap Emails
1679    YOU HAVE WON! As a valued Vodafone customer ou...
1571    Welcome to Select, an O2 service with added be...
1003    You are awarded a SiPix Digital Camera! call 0...
1465    Free video camera phones with Half Price line ...
1895    Todays Voda numbers ending with 7634 are selec...
Name: Message_body, dtype: object

Text Pre-Processing for Naive Bayes Algo

Step1: Text Cleaning

First step of any text analysis is to clean the data. Some common steps for text cleaning are removing numbers, removing punctuations, Stop words removal, Stemming of words, and Lemmatization of words. In the below code, we are just following these steps to clean text for our entire dataset.

# Removing Numbers and Punctuations from text in NLP with NLTK and Python. We use regex expressions to remove numbers.
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()

clean_text = []
for row_num in tqdm(range(len(df_full))):
    sentence = df_full.iloc[row_num]['Message_body']
    # Convert to lower case
    sentence = sentence.lower()
    # Remove numbers from text using regex
    sentence = re.sub(r'\d+','',sentence)
    # Remove punctuations from text
    sentence = sentence.translate(sentence.maketrans("","",string.punctuation))
    # Text tokenization to make a list of words
    word_list = w_tokenizer.tokenize(sentence)
    # Remove stop words from string
    words_in_sentence = list(set(sentence.split(' ')) - stop_words)
    # Lemmatization of word
    words = []
    for i,word in enumerate(words_in_sentence):
        words_in_sentence[i] = lemmatizer.lemmatize(word)
        
    # Convert token list to sentence
    sen = ' '.join(words_in_sentence)
    clean_text.append(sen)

# Create Separate column for cleaned text
df_full['Cleaned_Message'] = clean_text

# Show some data
df_full.tail()

email classification dataset text cleaning in NLP

Step2: Vectorization of Text

Before we can proceed with model building, first we need to preprocess our text to transform it into a numeric form. Because machine learning models cannot understand text, the first step is to use Count Vectorization, a text preprocessing technique, to convert the text into a vector of numbers using the CountVectorizer module in scikit-learn.

Note: You can also use word2vec word embedding to convert text into numeric form.

CountVectorizer is frequently used in Natural Language Processing (NLP), both for model building and text analysis via n-grams (also called Q-grams or shingles). CountVectorizer takes our input text and returns the number of times a word appears in the entire “corpus” (that is, full dataset of quora).

Also Read: What are p values in Statistics with simple Examples

To use CountVectorizer, we first need to instantiate it, then we need to pass our input text column to the fit_transform() function to return a Bag of Words model. We’ll then convert that Bag of Words to a dense array using Numpy that can better be used by the model.

Note: There are several approaches for preprocessing text for machine learning models, and this is only one of them.

count_vector = CountVectorizer()

bow_out = count_vector.fit_transform(df_full['Cleaned_Message'])
bow_array = np.array(bow_out.todense())
bow_array

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

Construct a Naive Bayes Classifier from scratch in Python

Now that you are done with data processing and cleaned the data, we can now implement the Naive Bayes algorithm in machine learning in python by following below steps:

Prepare the training and testing data

Now, we’ll send our Bag of Words data to use it as the feature variables (X variables) for our model. We also need to assign “Label” variable as the target variable (y) that we want to predict.

The train test split() method will then be used to generate our train and test data. This function utilize stratification technique of statistics to gurantee that the proportions are distributed evenly throughout the dataset.

We will use 30% of the entire data for testing or validation and utilise stratification to guarantee that the proportions are distributed evenly throughout the datasets.

X = bow_array
y = df_full['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.3, 
                                                    stratify=y)

Build a Multinomial Naive Bayes classification model

Now that everything is in place, we’ll fit a Multinomial Naive Bayes classification model using MultinomialNB module from scikit-learn library. We’ll feed our X train and y train data to the fit() method to train the model and predict the email type from the vectors of the “Cleaned_Message” text.

The Multinomial Naive Bayes model is the standard type of Naive Bayes model which can be used to classify any data with discrete characteristics or categorical variables, such as our email classification problem.

model = MultinomialNB().fit(X_train, y_train)

Note: This is just a simple model, to implement in your real-world project you must try naive bayes hyperparameter tuning

Analyze model performance

Now that our model is trained and ready, let’s try to predict and evaluate our model for test data and check the accuracy of our model.

# Prediction for trained Naive Bayes model
y_pred = model.predict(X_test)

# Print accuracy for Naive Bayes model
print('Accuracy:', accuracy_score(y_test, y_pred))
print('F1 score:', f1_score(y_test, y_pred, average="macro"))

Accuracy: 0.9686956521739131
F1 score: 0.9346590909090908

To check some important evaluation parameters like precision, recall, f1 score, you can try classification_report() function.

print(classification_report(y_test, y_pred))

confusion matric for naive bayes algo for machine learning in python

Conclusion

Naive Bayes algorithm in machine learning is often used in sentiment analysis, spam filtering, recommendation systems, text classification and other applications. They are quick and simple to deploy. In this post, I explained Naive Bayes algo with one simple python implementation.

Anindya

Hi there, I’m Anindya Naskar, Data Science Engineer. I created this website to show you what I believe is the best possible way to get your start in the field of Data Science.