Emotion Recognition from Facial Expressions in Python

Do you know that friend of yours who can tell when you’re upset before you even realize it? Or that colleague who seems always to remain calm and in control, no matter what life throws at them? These people have mastered the art of reading and detecting facial emotions — which, luckily, has been made easier with the help of deep learning. But if you’re not so good at this yet, don’t worry! This guide will help you learn facial emotion recognition using deep learning techniques.

Application of Facial expression detection

Facial emotion identification is an interesting task in Computer Vision. It can be used in a variety of fields such as:

  • Healthcare: These cameras can provide useful feedback for hospitals, such as monitoring the facial reactions of a patient to medical equipment or treatments. This system can help monitor all patients at the same time, assess patient conditions, and improve health care.
  • Improve Video marketing: In order to ensure that their videos are as effective as possible, businesses must use facial emotion detection technology to track what viewers are watching and how they are responding to it.
  • Automotive: Car manufacturers all over the world are focusing on creating vehicles that are safe to drive and provide users with a personalized experience. Furthermore, the use of AI, ML, and deep learning techniques in facial detection systems allows them to understand human emotions. It protects drivers by analyzing their facial expressions for fatigue or drowsiness and sending alerts. If the system detects an anomaly, it alerts the driver to take a break, get some coffee, listen to music, or adjust the temperature.
  • etc.

Technique for Emotion Recognition

In general facial expression, recognition can be done by two different techniques:

  • Camera-Based: In this tutorial, we will be concentrating on this technique, where we will train a deep learning model to predict someone’s emotion based on camera footage or visuals.
  • Biosignals: A good example of this type of system is a lie detector where if a person visually seems to be neutral but he is actually angry, you find that using biosignals. This kind of machine has different types of sensors that need to be connected to different parts of the human body.

Okay now, let’s get started with the vision-based or camera-based approach.

Implementation Steps

To do this project you need to follow the below techniques step by step:

  • First, need to detect the face from any image or video (real-time)
  • Next train a deep learning model (from the CNN family)
  • Finally using that trained model, you can predict or identify whether a person is sad, surprised, happy or any other based on facial expressions

To implement this project, I will use Python as a programming language, OpenCV for overlaying text data or drawing rectangles, and for implementing deep learning architecture we will be using the TensorFlow library.


In this tutorial, I am going to train a deep-learning model for facial emotion recognition. Now if you are at the beginner level and you hesitate to train any deep learning architecture from scratch, you can try DeepFace. You no need to train anything if you are using this library. Lots of deep learning architectures for face emotion recognition are already available in DeepFace.

Deepface is a popular facial recognition and emotion detection framework for python. It is developed by the research group at Facebook in 2015. This framework can achieve up to 97% accuracy. DeepFace is used by Facebook for identifying theft on its platform.


In this tutorial, I am going to use TensorFlow to train a deep learning model for classification. To do that you need to install some libraries.

  • OpnCV
  • Matplotlib
  • Tnsorflow

To install those libraries you can run the below commands in your terminal.

pip install opencv-python
pip install matplotlib
pip install tensorflow

Note it will install Tensorflow for CPU. You can train deep learning model in CPU but it will take so much time to completely train the model.

If you want to install Tensorflow for GPU follow this tutorial. Else you can also use google colab to train this model, code and configurations will be the same.

Download Training Image for Facial Emotion Recognition

Now let’s download the training images to train a model to achieve real-time facial expression recognition in python. In this tutorial, I am going to use the FER-2013 image dataset. You can click here to download FER-2-013 dataset for facial expression detection.

Let’s see some statistics about this dataset.

  • Size: 63Mb
  • Number of classes: 7 (angry, disgust, fear, happy, neutral, sad, surprise)
  • For each class, there are separate folders containing images for that specific type of image

Data pre-processing

Let me break the entire data preparation into some steps:

Import Required Packages

Le’s first import all libraries useful for this project.

import tensorflow as tf
import cv2

import os
import matplotlib.pyplot as plt
import numpy as np

from keras_preprocessing.image import load_img, img_to_array

Image count for each class

Now let’s see how many images we have in total in our training data set and number of images for each class (training data variation).

# Define name of each classes (all image folder names)
classes = ['angry', 'disgust', 'fear', 'happy', 'neutral','sad', 'surprise']

# Print number of images for each class
folder_path = "data/"
for cls in classes:
    path = os.path.join(folder_path, 'train', cls)
    lst = os.listdir(path)
    number_files = len(lst)
    print(cls, ': ', number_files)


angry :  3995
disgust :  436
fear :  4097
happy :  7215
neutral :  4965
sad :  4830
surprise :  3171

Here you can see in our training dataset total number of images is 28,709 (sum of all classes).

Also Read:  Download high resolution satellite imagery free online

Now if you want to train a model like MobileNet, which has lots of weights and parameters, you need to have a huge amount of RAM in your machine. I tried with my local system (which has 22 GB of RAM) as well as in the Google Colab basic version. I got the below error for both of the systems.

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

Now there is only one way to solve this memory issue, you need to make a small subset of your entire training dataset. And use that subset to train the model. Let’s do that.

Create a folder structure using Python

Let’s first create a folder (subset folder) where we will copy some parts of the training images from our original training image folder.

Note: If you want to create folder manually, you can skip this step.

# Create a empty folder
# Create "train" folder inside new_data folder
# Crate sub-folders
for cls in classes:

Here we are:

  • Creating a subset folder named new_data
  • Inside new_data creating another folder named train (to match the folder structure of the main training data folder)
  • Last line: Inside the train folder create 7 different folders with names of 7 different classes which are:
    1. angry
    2. disgust
    3. fear
    4. happy
    5. neutral
    6. sad
    7. surprise

So now our subset folder structure now looks like the below:


Note: This is the same folder structure as per the original training dataset folder (downloaded folder).

Copy images to Subfolder

Now you need to copy some images to the subfolder (new_data) from downloaded training image folder.

To do that I will copy 436 random images (which is the count of disgust images) of each emotion images to the subfolder. The below code is to do that.

# Copy 436 files to new folder
import shutil
import random

num_files = 436

for cls in classes:
    # Downloaded original training image folder path for face emotion recognition
    src_path = os.path.join('D:/Face Emotion/v2/data', 'train', cls)
    # Sub folder path
    dst_path = os.path.join('D:/Face Emotion/v2/new_data', 'train', cls)
    src_files = os.listdir(src_path)
    # Select random 436 images from source directory
    src_select_files = random.sample(src_files, num_files)

    # Copy selected images to destination folder
    for file_name in src_select_files:
        full_file_name = os.path.join(src_path, file_name)
        if os.path.isfile(full_file_name):
            shutil.copy(full_file_name, dst_path)

This code will copy random 436 images from the original to subfolder paths.

So now let’s see the count of images in our newly created folder (subfolder).

# Print number of images for each class
folder_path = "new_data/"
for cls in classes:
    path = os.path.join(folder_path, 'train', cls)
    lst = os.listdir(path)
    number_files = len(lst)
    print(cls, ': ', number_files)


angry :  436
disgust :  436
fear :  436
happy :  436
neutral :  436
sad :  436
surprise :  436

So the total count of images in our sub-directory is => 3,052.

This the entire images we are going to use for training our custom deep learning model to identify emotions of the human face.

Show images using OpenCV

Now let’s display images for any class to see how our training data looks like:

picture_size = 48
folder_path = "new_data/"

expression = 'disgust'

plt.figure(figsize= (12,12))
for i in range(1, 10, 1):
    img = load_img(folder_path+"train/"+expression+"/"+
                  os.listdir(folder_path + "train/" + expression)[i], target_size=(picture_size, picture_size))

Here in this code:

  • Converting the image to 48×48 pixels before plotting
  • line 4: Change it to any class name to see sample images for that class



Change Image size

As you already know that in this post I am going to use MobileNet pre-trained deep learning architecture.

Now MobileNet is trained on top of ImageNet dataset. The image size of this ImageNet dataset is 224×224.

So to match the input format of MobileNet we need to resize our input training images. Let’s do that.

# Function to Read all the images: resize and convert in them to array using opencv

img_size = 224 ## ImageNet => 224x224
training_data = []

def create_training_data():
    for category in classes:
        path = os.path.join(folder_path, 'train', category)
        class_num = classes.index(category)
        for img in os.listdir(path):
                img_array = cv2.imread(os.path.join(path, img))
                new_array = cv2.resize(img_array, (img_size, img_size))
                training_data.append([new_array, class_num])
            except Exception as e:

Now let’s check the training data shape

temp_array = np.array(training_data)


(3052, 2)

Now let’s see the shape of one converted image.

# Reading one image from angry folder
img_array = cv2.imread('new_data/train/angry/Training_178362.jpg')
print('Input image shape: ', img_array.shape)

# Convert image to 224x224
img_size = 224 ## ImageNet => 224x224
new_array = cv2.resize(img_array, (img_size, img_size))
print('Converted image shape: ', new_array.shape)


Input image shape:  (48, 48, 3)
Converted image shape:  (224, 224, 3)

Add image dimensions

You can see the shape of the converted image shape is => (224, 224, 3), which is 3-dimensional. But the shape of ImageNet images is 4-dimensional. So we need to convert the shape of our input images to match the ImageNet format. The below code is to do that.

X = []
y = []

for features, label in training_data:
X = np.array(X).reshape(-1, img_size, img_size, 3) # converting it to 4 dimention


# Convert to array
Y = np.array(y)


(3052, 224, 224, 3)

Here in this code:

  • Storing features in the X variable
  • Storing class labels in the y variable
  • Adding one extra dimension at line 8

Setup MobileNet model

tensorflow.keras.applications contains a large collection of models, we can use any model to predict the image. In this tutorial, I am going to use the MobileNet_v2 model which is one of the best face recognition algorithms.

Also Read:  Shape detection using OpenCV and Python

Mobilenet_v2 is the second version model of the Mobilenet series. This model use CNN (Convolutional neural networks) to predict image classes.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Download Pre-trained MobileNet Model
model = tf.keras.applications.MobileNetV2() ## Pre-trained Model

# Print MobileNet architecture
mobilenet architecture to train deep learning model

In this code, we are downloading pre-trained weights for MobileNet model and printing model architecture.

In the model architecture, if you see the last layer (fully connected Dense layer for predictions) has 1000 classes. But for our case, we have only 7 classes (angry, disgust, fear, happy, neutral, sad, and surprise). So we need to change the last layer of the downloaded MobileNet pre-trained model. This technique is called Transfer Learning.

Note: We will not change the input layer as we already converted our image data to match the image net format (224×224).

# Defining first layer as input layer of Mobilenet
base_input = model.layers[0].input

# Removing last layer of MobileNet model
base_output = model.layers[-2].output

# Adding some extra layers
final_output = layers.Dense(128)(base_output) ## adding new layer, after the output of global pooling layer
final_output = layers.Activation('relu')(final_output) ## activation function
final_output = layers.Dense(64)(final_output)
final_output = layers.Activation('relu')(final_output)
# Defining final layer with 7 classes
final_output = layers.Dense(7, activation = 'softmax')(final_output) ## 7 because my classes are 7

Here in this code we are removing the last layer from the downloaded model and adding our customized output layer with some additional layers.

Now let’s create our custom MobileNet model to train facial emotion recognition.

custom_model = keras.Model(inputs = base_input, outputs = final_output) ## Final model architecture
# Print our custom model summary

# Compiling the model to train
custom_model.compile(loss = 'sparse_categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
setup custom mobilenet model to train facial emotion recognition

For this example, I am using:

  • Sparse categorical cross entropy as loss function
  • adam as optimizer. There are so many optimizers available for deep learning like: RMSProp, Mini-Batch Gradient Descent, Stochastic Gradient Descent, AdaDelta, you can try any of those.

Train model

So finally we are done with our data preparation and model configuration. Now let’s train our custom model to detect facial expressions. To do that just run the below code.

I am going to use batch size = 8 and 25 epochs to avoid any memory error while training in the local windows system.

custom_model.fit(X, Y, epochs = 25, batch_size = 8)
Epoch 1/25
675/675 [==============================] - 76s 96ms/step - loss: 1.8034 - accuracy: 0.2761
Epoch 2/25
675/675 [==============================] - 65s 96ms/step - loss: 1.6636 - accuracy: 0.3439
Epoch 3/25
675/675 [==============================] - 65s 96ms/step - loss: 1.5420 - accuracy: 0.4009
Epoch 4/25
675/675 [==============================] - 65s 96ms/step - loss: 1.4656 - accuracy: 0.4339
Epoch 5/25
675/675 [==============================] - 65s 96ms/step - loss: 1.4055 - accuracy: 0.4546
Epoch 6/25
675/675 [==============================] - 65s 96ms/step - loss: 1.3397 - accuracy: 0.4770
Epoch 7/25
675/675 [==============================] - 65s 96ms/step - loss: 1.2819 - accuracy: 0.4963
Epoch 8/25
675/675 [==============================] - 65s 96ms/step - loss: 1.2336 - accuracy: 0.5193
Epoch 9/25
675/675 [==============================] - 65s 96ms/step - loss: 1.1890 - accuracy: 0.5406
Epoch 10/25
675/675 [==============================] - 65s 96ms/step - loss: 1.1445 - accuracy: 0.5531
Epoch 11/25
675/675 [==============================] - 65s 96ms/step - loss: 1.0783 - accuracy: 0.5846
Epoch 12/25
675/675 [==============================] - 65s 96ms/step - loss: 1.0357 - accuracy: 0.6009
Epoch 13/25
675/675 [==============================] - 65s 96ms/step - loss: 0.9942 - accuracy: 0.6181
Epoch 14/25
675/675 [==============================] - 65s 96ms/step - loss: 0.9499 - accuracy: 0.6402
Epoch 15/25
675/675 [==============================] - 65s 96ms/step - loss: 0.8700 - accuracy: 0.6680
Epoch 16/25
675/675 [==============================] - 65s 96ms/step - loss: 0.8253 - accuracy: 0.6883
Epoch 17/25
675/675 [==============================] - 65s 96ms/step - loss: 0.7788 - accuracy: 0.7056
Epoch 18/25
675/675 [==============================] - 65s 96ms/step - loss: 0.7365 - accuracy: 0.7202
Epoch 19/25
675/675 [==============================] - 65s 96ms/step - loss: 0.6909 - accuracy: 0.7374
Epoch 20/25
675/675 [==============================] - 65s 96ms/step - loss: 0.6241 - accuracy: 0.7631
Epoch 21/25
675/675 [==============================] - 65s 96ms/step - loss: 0.5778 - accuracy: 0.7883
Epoch 22/25
675/675 [==============================] - 65s 96ms/step - loss: 0.5336 - accuracy: 0.7978
Epoch 23/25
675/675 [==============================] - 65s 96ms/step - loss: 0.4939 - accuracy: 0.8202
Epoch 24/25
675/675 [==============================] - 65s 96ms/step - loss: 0.4510 - accuracy: 0.8300
Epoch 25/25
675/675 [==============================] - 65s 96ms/step - loss: 0.4434 - accuracy: 0.8359
<keras.callbacks.History at 0x1a4008bfb20>

As you can see the model started with 27% accuracy. It is because we are using the pre-trained model. It already knows some basic feature information about any images. This is the beauty of Transfer learning. The final accuracy of our custom emotion detection model is 83%

Save trained model

Now let’s save our trained model (to use it later at any point in time) using the below code.


Facial Emotion Recognition for Images

Okay, so we are done with our custom model training for emotion detection. Now let’s test how our model is performing for any image.

Also Read:  Learn CNN from scratch with Python and Numpy

Read image

Let’s first read the image using OpenCV. This sample image I have downloaded from the internet to test the model.

# Read downloaded test image in Opencv
test_img = cv2.imread('happy_face.jpg')
# Take a backup of input image before face detection
img_bcp = test_img.copy()

# Show image in OpenCV
plt.imshow(cv2.cvtColor(test_img, cv2.COLOR_BGR2RGB))


input image for face emotion detection

Face detection and cropping

To detect facial expressions we first need to detect the face. Then the detected face (cropped) we need to pass to our custom model to predict. In this tutorial, I am going to use HAAR cascading algorithm to detect the face. This is easy and lightweight.

# Define haar cascade classifier for face detection
face_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
# Convert image to gray scale OpenCV
gray_img = cv2.cvtColor(test_img, cv2.COLOR_BGR2GRAY)
# Detect face using haar cascade classifier
faces_coordinates = face_classifier.detectMultiScale(gray_img)

# Draw a rectangle around the faces
for (x, y, w, h) in faces_coordinates:
    # Draw rectangle around face
    cv2.rectangle(test_img, (x, y), (x + w, y + h), (0, 255, 0), 2)
    # Crop face from image
    cropped_face = img_bcp[y:y+h, x:x+w]
# Plot original image
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(test_img, cv2.COLOR_BGR2RGB))

# Plot cropped image after performing face detection
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(cropped_face, cv2.COLOR_BGR2RGB))
face detection output to perform facial expression identification project

Model Prediction

In this step, we will pass this cropped image to the model to predict the emotion of the face. But before passing this cropped image we need to:

  • Change image shape: As our model only accepts ImageNet data format, so we need to convert its shape to 224×244 pixel
  • Convert it to a 4-dimensional image: Since the downloaded image is a normal 3-dimensional image, we need to convert it to 4d (as we trained our model for 4-dimensional images only).
#  Creating class dictionary
class_dictionary = {0: 'angry', 1: 'disgust', 2: 'fear', 3: 'happy', 4: 'neutral', 5: 'sad', 6: 'surprise'}

final_image = cv2.resize(cropped_face, (224,224))
final_image = np.expand_dims(final_image, axis=0) ## Need 4th dimension
final_image = final_image/255.0 ## Normalizing

# Load model
new_model = tf.keras.models.load_model('facial_expression_model.h5')
prediction = new_model.predict(final_image)


1/1 [==============================] - 1s 672ms/step

Now, this is the final output. As you can see our model truly predicts the feeling of the girl which is happiness. Now let’s overlay this predition on top of the input image.

# Convert 4-d image to 3d
img_3d = np.squeeze(img_bcp)

# Define opencv font style

# Draw a rectangle around the faces
for (x, y, w, h) in faces_coordinates:
    # Draw rectangle around face
    cv2.rectangle(img_3d, (x, y), (x + w, y + h), (0, 255, 0), 2)

# Write face emotion class text on image
cv2.putText(img_3d, class_dictionary[np.argmax(prediction)], (250,75),font, 3, (0,0,255), 2, cv2.LINE_4)

# Show output image
plt.imshow(cv2.cvtColor(img_3d, cv2.COLOR_BGR2RGB))
final output for the image sentiment detection
Final Output

Real-time Facial Emotion Recognition for Video

Now let’s implement this code for a video to see how our model is performing real time to recognize facial emotion.

import tensorflow as tf
import cv2

import os
import matplotlib.pyplot as plt
import numpy as np

from keras_preprocessing.image import load_img, img_to_array

# Load trained model
new_model = tf.keras.models.load_model('facial_expression_model.h5')

# Play video in python
import cv2

font_scale = 1.5

#  Creating class dictionary
class_dictionary = {0: 'angry', 1: 'disgust', 2: 'fear', 3: 'happy', 4: 'neutral', 5: 'sad', 6: 'surprise'}

# Below function will read video imgs
cap = cv2.VideoCapture(
    'Free people expression footage  mad shocked surprised wow amazed face   _ NO COPYRIGHT VIDEOS.mp4')

while True:
    read_ok, img = cap.read()
    img_bcp = img.copy()
    cv2.imshow("Play video in python", img)

    # Define haar cascade classifier for face detection
    face_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

    # Convert image to gray scale OpenCV
    gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Detect face using haar cascade classifier
    faces_coordinates = face_classifier.detectMultiScale(gray_img)

    # Draw a rectangle around the faces
    for (x, y, w, h) in faces_coordinates:
        # Draw rectangle around face
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

        # Crop face from image
        cropped_face = img_bcp[y:y + h, x:x + w]

    final_image = cv2.resize(cropped_face, (224, 224))
    final_image = np.expand_dims(final_image, axis=0)
    final_image = final_image / 255.0

    predictions = new_model.predict(final_image)

    cv2.putText(img, class_dictionary[np.argmax(predictions)], (100, 150), font, 3, (0, 0, 255), 2, cv2.LINE_4)

    cv2.imshow('Face Emotion Recognition', img)

    # Close video window by pressing 'x'
    if cv2.waitKey(1) & 0xFF == ord('x'):

End Note

In this post, I tried to give a state-of-the-art approach to implement real-time emotion recognition from facial expressions using CNN architecture. The only improvement you can do is to increase the training data if you have a huge amount of RAM (amazon instances like EC2).

That’s all for this article. If you have any questions or suggestions regarding this tutorial don’t hesitate to shoot those in the comment section below.

Also Read:

Leave a comment