OCR in short Optical Character Recognition or optical character reader is an essential part of data mining, which mainly deals with typed, handwritten, or printed documents. Every image in the world contains some information. Some of them have characters that humans can easily read, programming a machine to read them can be called OCR. In this tutorial, I will explain you detailed code for pytesseract (python wrapper of tesseract) image to string operation.
Table of contents
- Applications of OCR
- Best OCR library Python
- About Tesseract OCR
- Tesseract OCR Python Install
- OCR with Pytesseract and OpenCV
- Draw bounding box around text
- Extract text inside bounding box using tesseract and OpenCV
- Get ROI coordinates by mouse click
- Extract specific text inside bounding box (ROI)
- Limitations of Tesseract OCR
1. Applications of OCR
Before we go deeper into OCR implementation, we need to know the places where we can implement OCR. Some examples of such use cases are as follows:
- Extract information from Forms and Receipts
- Convert scanned documents to a digital copy
- Number Plate Recognition
- Extract information from important documents such as Passports, PAN cards, Adhar cards, etc.
2. Best OCR libraries in Python
There are lots of OCR libraries or tools available. I am listing down some popular libraries among them below:
- Tesseract
- Keras-OCR
- Python-docTR
- Easy OCR
- Paddle OCR
- etc.
In this post, I am going to explore pytesseract OCR library in Python.
Note: If you are new to image processing and computer vision, I will suggest you to take this Udemy course: Master Computer Vision OpenCV4 in Python with Deep Learning.
3. About Tesseract OCR

Tesseract (developed and sponsored by Google since 2006) is an open-source tool for Optical Character Recognition (OCR). The best part is that it supports a wide range of languages and it is compatible with different programming languages through wrappers. In this post, I will use pytesseract, which is a python library (wrapper) for tesseract.
We usually use a convolutional neural network (CNN) to recognize an image containing a single character or Object. Text of any length is a sequence of characters and such problems are solved using RNN and LSTM. Tesseract uses LSTM to detect and predict text in any image. LSTM or long short term memory is a popular algorithm in RNN family. Read this post to learn more about RNN. There are mainly 3 stages in Tesseract:
- Find words
- Detect lines
- Classify characters
4. Install tesseract OCR Python
To use tesseract OCR (developed by Google) in python, we need to install pytesseract library by running the below command:
> pip install pytesseract
5. OCR with Pytesseract and OpenCV
Let’s first import the required packages and input images to convert into text.
In this tutorial, I am using the following sample invoice image

# Import OpenCV
import cv2
# Import tesseract OCR
import pytesseract
# Read image to convert image to string
img = cv2.imread('input/restaurant_bill.jpg')
Pre-processing for Tesseract
In every image analysis task preprocessing is a must. To get better accuracy from tesseract, you should do at least some basic image pre-processing like grey-scale conversion, binarization, etc. Read this post to learn more about image transformation functions in OpenCV.
# Resize the image if required
height, width, channel = img.shape
images = cv2.resize(img, (width//2, height//2))
# Convert to grayscale image
gray_img = cv2.cvtColor(images, cv2.COLOR_BGR2GRAY)
# Converting grey image to binary image by Thresholding
thresh_img = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

If you want to learn more about image pre-processing and applications transformations the below article may be helpful for you:
Helpful Articles:
- Most useful OpenCV functions to know for image analytics
- Find and Draw Contours with OpenCV in Python
- Shape detection using OpenCV and Python
Tesseract Configuration
Once you are done with image transformation (image pre-processing), now you should configure tesseract. If you are not specifying the configuration parameters, Tesseract will use the default configuration.
# Configuring parameters for tesseract
custom_config = r'--oem 3 --psm 6'
- Engine Mode (–OEM): Tesseract uses different types of algorithms or engine modes in its back-end. Below are the different OCR engine modes with the id number of Tesseract. you can select the one that works best for your requirement:
OEM IDs | Working Description |
0 | Legacy engine only |
1 | Neural net LSTM only |
2 | Legacy with LSTM model |
3 | By default, based on currently available |
- Page Segmentation Mode (–PSM): By configuring this, you can inform Tesseract that how to split the input image in the form of texts. Tesseract has 11 modes in total. From the table below, you can choose the one that best suits your requirements:
PSM IDs | Work Description |
opsm | Orientation and script detection (OSD) only |
1 | Automatic page segmentation with OSD |
2 | Automatic page segmentation, but not using OSD, or OCR |
3 | Fully automatic page segmentation, but not using OSD (Default configuration) |
4 | Assume a single column of text of variable sizes |
5 | Assume a single uniform block that has a vertically aligned text |
6 | Assume a single uniform block of text |
7 | Treat image as a single text line |
8 | Treat image as a single word |
9 | Treat image as a single word in a circle |
10 | Treat the image as a single character |
11 | Sparse text. Find as much text as possible not in a particular order |
12 | Sparse text using OSD |
13 | Raw line. Treat the image as a single text line, bypass hack by Tesseract-specific |
Convert Image to Text
Now finally we can convert our pre-processed image to text. After conversion, we can write the output text into a text file.
# Converting image to text with pytesseract
ocr_output = pytesseract.image_to_string(thresh_img)
# Print output text from OCR
print(ocr_output)
# Writing OCR output to a text file
with open('python_ocr_output.txt', 'w') as f:
f.write(ocr_output)
# Display image
cv2.imshow('Gray image', images)
cv2.waitKey(0)
Full Code
# Import OpenCV
import cv2
# Import tesseract OCR
import pytesseract
# Read image to convert image to string
img = cv2.imread('input/restaurant_bill.jpg')
# Resize the image if required
height, width, channel = img.shape
images = cv2.resize(img, (width//2, height//2))
# Convert to grayscale image
gray_img = cv2.cvtColor(images, cv2.COLOR_BGR2GRAY)
# Converting grey image to binary image by Thresholding
thresh_img = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# configuring parameters for tesseract
custom_config = r'--oem 3 --psm 6'
# Converting image to text with pytesseract
ocr_output = pytesseract.image_to_string(thresh_img)
# Print output text from OCR
print(ocr_output)
# Writing OCR output to a text file
with open('python_ocr_output.txt', 'w') as f:
f.write(ocr_output)
# Display image
cv2.imshow('Gray image', images)
cv2.waitKey(0)

6. Draw bounding boxes around text
Using Pytesseract, you can get bounding box information for OCR results using the following code. The script below will give you bounding box information for each character or word detected by tesseract during OCR.
import pytesseract
from pytesseract import Output
# Import OpenCV library
import cv2
import csv
# Read image to extract text from image
img = cv2.imread('input/restaurant_bill.jpg')
# Resize the image if required
height, width, channel = img.shape
img = cv2.resize(img, (width//2, height//2))
# Convert image to grey scale
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Converting grey image to binary image by Thresholding
thresh_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# configuring parameters for tesseract
custom_config = r'--oem 3 --psm 6'
# Get all OCR output information from pytesseract
ocr_output_details = pytesseract.image_to_data(thresh_img, output_type = Output.DICT, config=custom_config, lang='eng')
# Total bounding boxes
n_boxes = len(ocr_output_details['level'])
# Extract and draw rectangles for all bounding boxes
for i in range(n_boxes):
(x, y, w, h) = (ocr_output_details['left'][i], ocr_output_details['top'][i], ocr_output_details['width'][i], ocr_output_details['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Print OCR Output kesys
print(ocr_output_details.keys())
# Show output image with bounding boxes
cv2.imshow('img', img)
cv2.waitKey(0)

Here in this code:
- Line 21: Configuring Tesseract model
- Line 24: Extracting all outputs from Tesseract OCR in dictionary format
Now if you print the OCR output details, you will see all kinds of output information shown below.
dict_keys(['level', 'page_num', 'block_num', 'par_num', 'line_num', 'word_num', 'left', 'top', 'width', 'height', 'conf', 'text']
Here in this output dictionary, the key named ‘text‘ contains the text output from the OCR, and key named ‘left‘, ‘top‘, ‘width‘, and ‘height‘ contains bounding box coordinates for that particular text.
Text format matching
Now that you have an image with a bounding box, you can move on to the next part, which is to organize the captured text into a file with matching text format as per the input image.
Note: Here I have written this code based on my input image format and output from Tesseract. If you use a different image format, you must write the code according to that image format.
The code below is used to format the resulting text according to the current image and save the text formated result into a txt file.
# Arrange output text from OCR into the format as per image
output_text = []
word_list = []
last_word = []
for word in ocr_output_details['text']:
# If there is no text in an element
if word != '':
word_list.append(word)
last_word = word
# Append final list of words with valid words
if (last_word != '' and word == '') or (word == ocr_output_details['text'][-1]):
output_text.append(word_list)
word_list = []
# Display OCR output image
with open('OCR_output.txt', 'w', newline = '') as file:
csv.writer(file, delimiter = ' ').writerows(output_text)

Full Code
import pytesseract
from pytesseract import Output
# Import OpenCV library
import cv2
import csv
# Read image to extract text from image
img = cv2.imread('input/restaurant_bill.jpg')
# Resize the image if required
height, width, channel = img.shape
img = cv2.resize(img, (width//2, height//2))
# Convert image to grey scale
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Converting grey image to binary image by Thresholding
thresh_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# configuring parameters for tesseract
custom_config = r'--oem 3 --psm 6'
# Get all OCR output information from pytesseract
ocr_output_details = pytesseract.image_to_data(thresh_img, output_type = Output.DICT, config=custom_config, lang='eng')
# Total bounding boxes
n_boxes = len(ocr_output_details['level'])
# Extract and draw rectangles for all bounding boxes
for i in range(n_boxes):
(x, y, w, h) = (ocr_output_details['left'][i], ocr_output_details['top'][i], ocr_output_details['width'][i], ocr_output_details['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Print OCR Output
print(ocr_output_details.keys())
# Arrange output text from OCR into the format as per image
output_text = []
word_list = []
last_word = []
for word in ocr_output_details['text']:
# If there is no text in an element
if word != '':
word_list.append(word)
last_word = word
# Append final list of words with valid words
if (last_word != '' and word == '') or (word == ocr_output_details['text'][-1]):
output_text.append(word_list)
word_list = []
# Display OCR output image
with open('OCR_output.txt', 'w', newline = '') as file:
csv.writer(file, delimiter = ' ').writerows(output_text)
cv2.imshow('img', img)
cv2.waitKey(0)
7. Extract specific text inside bounding box using tesseract and OpenCV
So we know how to extract all the text available in the image. Now if you want to extract only some words or sentences in a specific region in the image, what to do?
For example, I want to extract only the “Amount” field from the invoice below.

To do that first we need to know the coordinate information (top X, top Y, bottom X, bottom Y) for that specific portion in your input image.
7.1 Get ROI coordinates by mouse click
To get coordinate values for our area of interest, you can use the below code. You just need to modify line 27 (input image directory) and line 31 (image resizing option).
import cv2
circles = []
counter = 0
counter2 = 0
Clickpoint1 = []
Clickpoint2 = []
myCoordinates = []
# Function to store left-mouse click coordinates
def mouseClickPoints(event, x, y, flags, params):
global counter, Clickpoint1, Clickpoint2, counter2, circles
if event == cv2.EVENT_LBUTTONDOWN:
# Draw circle in red color
cv2.circle(img, (x, y), 3, (0,0,255), cv2.FILLED)
if counter == 0:
Clickpoint1 = int(x), int(y)
counter += 1
elif counter == 1:
Clickpoint2 = int(x), int(y)
myCoordinates.append([Clickpoint1, Clickpoint2])
counter = 0
circles.append([x, y])
counter2 += 1
# Read image
img = cv2.imread('input/restaurant_bill.jpg')
# Resize image
height, width, channel = img.shape
img = cv2.resize(img, (width//2, height//2))
while True:
# To Display clicked points
for x, y in circles:
cv2.circle(img, (x, y), 3, (0,0,255), cv2.FILLED)
# Display original image
cv2.imshow('Original Image', img)
# Collect coordinates of mouse click points
cv2.setMouseCallback('Original Image', mouseClickPoints)
# Press 'x' in keyboard to stop the program and print the coordinate values
if cv2.waitKey(1) & 0xFF == ord('x'):
print(myCoordinates)
break
Note: After running the above code, first you need to click the left mouse button at the top left of your Area of Interest, then you need to click on the bottom right portion. Once you are done with two clicks then press ‘X‘ on your keyboard to stop running this code and print the output coordinate of your Region of Interest.
If you are not sure how the above code is working please read this tutorial.

Output in the console
[[(81, 322), (356, 351)]]
7.2. Extract specific text inside bounding box (ROI)
Once we have the ROI information, we can extract anything inside that particular area by following steps:
- Crop that specific area from the input image
- Pass cropped image to OCR
- Get the output from OCR
In the below code we are just following the above steps. This code will write OCR output on the input image also.
import pytesseract
import cv2
# Read image for text extraction
img = cv2.imread('input/restaurant_bill.jpg')
height, width, channel = img.shape
# Resizing image if required
img = cv2.resize(img, (width//2, height//2))
# Convert image to grey scale
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Converting grey image to binary image by Thresholding
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# Crop image based on ROI coordinates (for specific bounding box)
# Coordinate information for required portion of image for which need to extract the text
roi_coordinate = [[(81, 322), (356, 351)]]
top_left_x = list(roi_coordinate[0])[0][0]
bottom_right_x = list(roi_coordinate[0])[1][0]
top_left_y = list(roi_coordinate[0])[0][1]
bottom_right_y = list(roi_coordinate[0])[1][0]
# Crop the specific required portion of entire image
img_cropped = threshold_img[top_left_y:bottom_right_y, top_left_x:bottom_right_x]
# Draw rectangle for area of interest (ROI)
cv2.rectangle(img, (top_left_x, top_left_y), (bottom_right_x, bottom_right_y), (0, 255, 0), 3)
# OCR section to extract text using pytesseract in python
# configuring parameters for tesseract
custom_config = r'--oem 3 --psm 6'
# Extract text within specific coordinate using pytesseract OCR Python
# Providing cropped image as input to the OCR
ocr_output = pytesseract.image_to_string(img_cropped, config=custom_config, lang='eng')
# Write OCR output in the original image *******
# OpenCV font
font = cv2.FONT_HERSHEY_DUPLEX
# Red color code
red_color = (0,0,255)
cv2.putText(img, f'{ocr_output}', (top_left_x-25, top_left_y - 10), font, 0.5, red_color, 1)
print(ocr_output)
cv2.imshow('img', img)
cv2.waitKey(0)
Before running the above code, you need to modify line 5 (input image directory) and line 8 (image resizing option).
Note: The resizing option should be the same in both of the codes (extraction code and get ROI coordinate code). Otherwise may end up extracting some other part of the image.

8. Limitations of Tesseract OCR
Although Tesseract is a good OCR tool to use, but there are some limitations of this library. Let’s what I found:
- It cannot perform well for images with complex backgrounds and distorted perspectives.
- Image orientation detection sometimes does not work
- Not able to detect complex handwritten text
Conclusion
Finally, it can be concluded that Tesseract is ideal for scanning clean documents and you can easily convert images to text and make any formatted document like pdf, word, etc. It has quite a high accuracy and font variability. This is very useful in the case of institutions where a large amount of documentation is involved, such as government offices, hospitals, educational institutes, etc. In the current version 4.0, Tesseract supports deep learning based on OCR, which is significantly more accurate.
In my next post, I will show you how you can extract important information from any document, form, or invoice.

Hi there, I’m Anindya Naskar, Data Science Engineer. I created this website to show you what I believe is the best possible way to get your start in the field of Data Science.