YOLO object detection using deep learning OpenCV | Real-time

In this OpenCV object detection tutorial, we will learn YOLO object detection using OpenCV. YOLO is one of the best deep learning based object detector, released in May 2016 and it became popular because it’s faster compared to other deep learning algorithms.

object detection and tracking opencv
YOLO object detection using OpenCV

In this article we will see how to recognize every day common objects, we encounter in the real world such as people, dog, car etc using pretrained deep learning based object detector. As you can see we have one dog and one lady holding one ball in hand in the input image and using deep learning based object detector we can detect the presence of those objects and label them in the output image.

You are going to learn in this article:

  • Object detection using deep learning (YOLO model)
  • Object detection and tracking OpenCV
  • How to draw bounding box for each detected object

Yolo object detection with python

There are mainly three primary object detection algorithms for deep learning-based object detection:

  1. You Only Look Once (YOLO)
  2. Faster R-CNNs
  3. Single Shot Detectors (SSDs)

In this object detection tutorial we will use pre-trained YOLO algorithm. There are various frameworks to work with pre-trained Yolo algorithm in python. like:

  • Darkflow: Framework to use inside Tensorflow (deep learning framework). It’s hard to install in windows.
  • Darknet: Built by Yolo developer. It only works with Linux os
  • OpenCV: Easy to install and use in windows.

In this yolo object detection tutorial we will do object detection using OpenCV.

Download files for YOLO object detection python code

Now before we start yolo object detection python code, we need to do download yolo model (pre-trained). You need to download two files:

  • Weight file: it’s the trained model, the algorithm for object detection.
  • Cfg file: it’s the configuration file, all the settings of that YOLO algorithm.
 yolov3.weights download
Download pretrained yolo model

Now if you visit their official website, you can see various versions to download YOLO model (pre-trained). In this tutorial, I am going to use YOLOv3-320 for object detection in python. Here 320 refers to the size of the images on which, YOLO model is trained.

As you can see if you are using lower resolution, your frame per seconds (FPS) will be faster. For example:

Image size (Resolution)Model NameFrame Per Second (FPS)
(320,320)YOLOv3-32045
(416,416)YOLOv3-41635
(608,608)YOLOv3-60820
NAYOLOv3-tiny220

As you can see for YOLOv3-tiny model frame rate is very high (220), but you have to compromise with your accuracy (you will have much less object detection but speed will be much higher).

Since YOLO object detection model is trained on COCO dataset (you can see in the image), we need to download name of the objects or names or the labels (for example: car, person etc.) which coco dataset is using. So you need to download coco.names file.

Note: There are total 80 object names in coco dataset.

To wind up this section you need to download total three files for yolo object detection python code:

  1. yolov3.weights: Pre-trained YOLO model
  2. yolov3.cfg: Configuration/ setting file of that model
  3. coco.names: Object label names

On image: YOLO object detection python code

We can divide opencv object detection python code into below steps:

Step1: Load image

import cv2
import numpy as np

# Loading image
img = cv2.imread("data/car.jpg")

Now this image can be of any size.

Step2: Load YOLO model files

# Load Yolo model files
yolo_weight = "data/model/yolov3.weights"
yolo_config = "data/model/yolov3.cfg"
coco_labels = "data/model/coco.names"
net = cv2.dnn.readNet(yolo_weight, yolo_config)

# Load coco object names file
classes = []
with open(coco_labels, "r") as f:
    classes = [line.strip() for line in f.readlines()]

In line 5 we are reading YOLO object detection model using cv2.dnn.readNet function.

Step3: Resize image

As we are working with YOLOv3-320 model, to get better results we should resize image to (320,320). Then in line 8 we are loading image height, width and channel (RGB or 3) value to separate variables.

Note: This model can give you output for any shape of image but to get better result you should resize it.

# # Defining desired shape
fWidth = 320
fHeight = 320

# Resize image in opencv
img = cv2.resize(img, (fWidth, fHeight))

height, width, channels = img.shape

Step4: Convert image to Blob

In this step we are giving input image to the YOLO object detection network. Now we can not provide a simple image to the YOLO object detection network. The YOLO object detection network only supports a particular type of format which is Blob. So we have to convert our input image to Blob, then only we can pass it to the network. We will do this by cv2.dnn.blobFromImage function (line 2). Then in line 4 we are sending converted blob image data to YOLO network.

# Convert image to Blob
blob = cv2.dnn.blobFromImage(img, 1/255, (fWidth, fHeight), (0, 0, 0), True, crop=False)
# Set input for YOLO object detection
net.setInput(blob)

Step5: Define output Layer

To understand this let’s look YOLO model Architecture.

yolo model architecture
YOLO model architecture

Here you can see there are three different output layers (Predict one, Predict two, Predict three). That means we have three different values coming out from YOLO network. Now in order to get output from these output layers, we need to know the names of these output layers.

# Find names of all layers
layer_names = net.getLayerNames()
print(layer_names)
# Find names of three output layers
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
print(output_layers)

In line 3 we are printing all layers, then in line 6 we are printing all three output layers ([‘yolo_82’, ‘yolo_94’, ‘yolo_106’])

Step6: Send image to forward pass

So we knew our output layers, now we will send Blob image data to those three output layers (line 2).

Note: If you only need output from any pre-trained model, you just need to pass input data to output layers. Other layers are required while training any model. Just to remind you.

# Send blob data to forward pass
outs = net.forward(output_layers)
print(outs[0].shape)
print(outs[1].shape)
print(outs[2].shape)

Now if you print shape of any output layer (0 for first output layer) you will see:

Layer nameCodeShape
First output layer‘yolo_82’print(outs[0].shape)(300, 85)
Second output layer‘yolo_94’print(outs[1].shape)(1200, 85)
Third output layer‘yolo_106’print(outs[2].shape)(4800, 85)

So there are total 85 columns for each output layers. Now among these 85 columns first 5 are respectively:

  1. Column 1: X value of center point of bounding box for an object
  2. Column 2: Y value of center point of bounding box for an object
  3. Column 3: Width of the bounding box
  4. Column 4: Height of the bounding box
  5. Column 5: Confidence value

Rest 80 are for 80 object.

YOLO output layer
YOLO object detection bounding box information
012345678384
Box NumberCxCywhconfidencepersonbicyclecarhair driertoothbrush
00.520.650.590.370.99000.9900
1
2
299

The above picture with the table is showing for first output layer (300 rows and 85 columns) for a given image (car image).

Note: Confidence is the accuracy for a particular object (for our example: car) for which parameter of the bounding box (Cx, Cy, w, h).

Now values like 300, 1200, and 4800 are bounding boxes which each output layer produces. Based on confidence threshold (greater than 0.5) we will extract objects.

opencv object detection python
Getting one bounding box from multiple boxes using confidence threshold

Step7: Create random color for each object

Generating random color for all 80 classes or objects. For example for object person color red, for object dog color green.

# Generating random color for all 80 classes
colors = np.random.uniform(0, 255, size=(len(classes), 3))

Step7: Extract information

Now that you know what are the important values we need to extract for object detection, let’s extract those now.

# Extract information on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        # Extract score value
        scores = detection[5:]
        # Object id
        class_id = np.argmax(scores)
        # Confidence score for each object ID
        confidence = scores[class_id]
        # if confidence > 0.5 and class_id == 0:
        if confidence > 0.5:
            # Extract values to draw bounding box
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
  • line 8: Extracting score value (5th column if you remember)
  • line 10: Extracting object IDs (example of object: person, car etc. you remember)
  • line 12: Confidence score for each object IDs
  • line 14: As I told values like 300, 1200 and 4800 are bounding boxes. For each bounding box there is confidence score. We will select only those bounding boxes with confidence score or accuracy higher than 0.5 (50 percent).
  • line 27: Because we have thousand of boxes, some of them will overlap. Multiple boxes can refer to the same object. we can remove these multiple boxes and keep one by Non Maximum Supression (cv2.dnn.NMSBoxes)
  • line 16-22: Extract values of the selected bounding box (for selected objects with accuracy higher than 50 percent)

Note: If you want to detect only one object you can use line 13 (commented out). line 13 is only to detect person (class_id = 0) in a image. You can try with different class ids. The value of class id is the index of a class or object in the COCO dataset.

Step8: Draw bounding boxes for objects

So now that we have extracted information of each bounding boxes with higher accuracy. Now we need to draw bounding box with text for each object.

# Draw bounding box with text for each object
font = cv2.FONT_HERSHEY_DUPLEX
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        confidence_label = int(confidences[i] * 100)
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, f'{label, confidence_label}', (x-25, y + 75), font, 1, color, 2)

cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Full OpenCV object detection python code for image

Now let’s put all together.

# opencv object tracking
# object detection and tracking opencv
import cv2
import numpy as np

# Loading image
img = cv2.imread("data/car.jpg")

# Load Yolo
yolo_weight = "data/model/yolov3.weights"
yolo_config = "data/model/yolov3.cfg"
coco_labels = "data/model/coco.names"
net = cv2.dnn.readNet(yolo_weight, yolo_config)

classes = []
with open(coco_labels, "r") as f:
    classes = [line.strip() for line in f.readlines()]

# print(classes)

# # Defining desired shape
fWidth = 320
fHeight = 320

# Resize image in opencv
img = cv2.resize(img, (fWidth, fHeight))

height, width, channels = img.shape

# Convert image to Blob
blob = cv2.dnn.blobFromImage(img, 1/255, (fWidth, fHeight), (0, 0, 0), True, crop=False)
# Set input for YOLO object detection
net.setInput(blob)

# Find names of all layers
layer_names = net.getLayerNames()
print(layer_names)
# Find names of three output layers
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
print(output_layers)

# Send blob data to forward pass
outs = net.forward(output_layers)
print(outs[0].shape)
print(outs[1].shape)
print(outs[2].shape)

# Generating random color for all 80 classes
colors = np.random.uniform(0, 255, size=(len(classes), 3))

# Extract information on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        # Extract score value
        scores = detection[5:]
        # Object id
        class_id = np.argmax(scores)
        # Confidence score for each object ID
        confidence = scores[class_id]
        # if confidence > 0.5 and class_id == 0:
        if confidence > 0.5:
            # Extract values to draw bounding box
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

# Draw bounding box with text for each object
font = cv2.FONT_HERSHEY_DUPLEX
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        confidence_label = int(confidences[i] * 100)
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, f'{label, confidence_label}', (x-25, y + 75), font, 1, color, 2)

cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Video object detection using python

Now that we have written object detection using opencv python code, now let’s see how to write video object detection in python. By this way you can do object tracking using webcam or any other camera.

It is so simple to convert any opencv object detection python code for image to video. You just need to put everything inside below code:

# Below function will read video frames
cap = cv2.VideoCapture('data/test_video.mp4')

while True:
    read_ok, img = cap.read()
    cv2.imshow("Play video in python", img)

    # Close video window by pressing 'x'
    if cv2.waitKey(1) & 0xFF == ord('x'):
        break

Above code is to read video frame by frame. If you want to know how to read webcam video please check my previous article.

Related Article:

Full code for video object detection using python

Now let’s see full code for video object detection using OpenCV.

# opencv object tracking
# object detection and tracking opencv
import cv2
import numpy as np

# Load Yolo
yolo_weight = "data/model/yolov3.weights"
yolo_config = "data/model/yolov3.cfg"
coco_labels = "data/model/coco.names"
net = cv2.dnn.readNet(yolo_weight, yolo_config)
classes = []
with open(coco_labels, "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

# Defining desired shape
fWidth = 256
fHeight = 256

# Below function will read video frames
cap = cv2.VideoCapture(0)

while True:
    read_ok, img = cap.read()

    height, width, channels = img.shape

    # Detecting objects
    blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    net.setInput(blob)
    outs = net.forward(output_layers)

    # Showing informations on the screen
    class_ids = []
    confidences = []
    boxes = []
    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                # Object detected
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                # Rectangle coordinates
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

    font = cv2.FONT_HERSHEY_DUPLEX
    for i in range(len(boxes)):
        if i in indexes:
            x, y, w, h = boxes[i]
            label = str(classes[class_ids[i]])
            confidence_label = int(confidences[i] * 100)
            color = colors[i]
            cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
            cv2.putText(img, f'{label, confidence_label}', (x-25, y + 75), font, 2, color, 2)

    cv2.imshow("Image", img)
    # Close video window by pressing 'x'
    if cv2.waitKey(1) & 0xFF == ord('x'):
        break

Conclusion

In this object detection tutorial article you learned yolo object detection using opencv python. To do that I have shared full code for yolo object detection python code. You can use those code for image object detection using opencv, video object detection in python opencv, object tracking etc.

To recap this article I am listing down all important points which I have discussed in this article:

  • Object detection using deep learning (YOLO model)
  • Object detection and tracking OpenCV
  • How to draw bounding box for each detected object

If you have any question or suggestion regarding this topic see you in comment section. I will try my best to answer.

5 thoughts on “YOLO object detection using deep learning OpenCV | Real-time”

Leave a Comment

Your email address will not be published. Required fields are marked *