Python Multithreading vs Multiprocessing

python multithreading vs multiprocessing
If you are dealing with a code which takes huge amount of time to execute, you should use parallel processing to speed up your code.
Parallel processing is a type of operation to execute multiple tasks at a same time. Parallel processing can be achieved either by running code simultaneously on different core (of CPUs) or on the same core (of CPUs) by utilizing CPU cycles.

Now there are two types of parallel processing
1.    Multiprocessing
2.    Multithreading
In this tutorial I will explain:
·       Difference between Multiprocessing and Multithreading in python.
·       Advantage and disadvantages of Multiprocessing and Multithreading in python.
·       How to write Multiprocessing and Multithreading code in Python
Before moving on let me explain the difference between Multi Cores and Multi Threads of a computer CPU.

What are Multi Core and Multi Thread?

multicore vs multithread

Cores (Multi Core)

Let’s say we have 4 cores CPU. Now think of each core is an individual worker. So each core (of CPU) performs tasks, which Operating System tells them to do. So you can think of like Operating System is kind of like a boss of each core.
multithread vs multicore

Threads (Multi Thread)

A thread is the sequence of commands given to the cores. You can think of threads as conveyer belt by which some product (sequence of code) are given to the worker (cores) to do something one by one.
Now you may hard something like Physical core and Logical Core. Let me clear that also.
  • Physical Core: Physical core is the actual hardware core (CPU of your computer)
  • Logical Core: It is the threads of the computer.

If you have a processor of 4 cores and 4 threads you will have 4 Logical Cores.

4 Physical Cores X (1 Thread/ Core) = 4 Logical Core
4 Physical Cores X (2 Thread/ Core) = 8 Logical Core
Note: In second case number of Threads = No of Cores. In this case each core has to take on two different threads of commands. In this case each core has to switch work for two threads. Let’s say switching time is 2 seconds for core 1. So at beginning core 1 will work for thread 1 and after 2 seconds core 1 will work for thread 2, this is how it will continue until entire job is done.

Difference between Multithreading and Multiprocessing

Multiprocessing
·       Process is an instance of program for example if I am running a program or software like Google chrome or python then it will be one process for each software or program
·       Process takes advantage of multiple CPUs and Cores. So you can execute codes in multiple CPUs in parallel.
·       Processes have separate memory space. So memory is not shared between processes.
·       A process can start independently from other processes
·       Processes are easily killable or interruptible
Multithreading
·       Thread is an entity of a process. So a Process can have multiple threads
·       All Threads within a process share same memory

Advantage and disadvantages of Multithreading and Multiprocessing in python

Advantage of Multiprocessing
If you have a large amount of data and you have to do lot of expensive computation, then with multiprocessing you can utilize multiple CPUs parallely to speed up your execution.
Disadvantage of Multiprocessing
·       Process is heavyweight
·       It takes more memory
·       Starting a Process is slower than starting a Thread
·       Memory sharing is not easy between Processes as it has separate memory space. So inter process communication is complicated.
Advantage of Multithreading
·       Thread is lightweight
·       Starting time of Threading is faster than Process
·       Threading is great for input output tasks (for example: LED control, LCD control etc.)
Disadvantage of Multithreading
·       Only one thread can execute at a time. So there no actual parallel computation
·       Threading is not interruptible or kill able. So be careful with memory lick while working with multi threading
·       You have to be careful with race condition (when two or more threads wants to modify same variable at the same time) as threading shares same memory within a Process. It can easily crash

Python code for Multiprocessing

In this tutorial I will show you basic and advanced level Python code of Multiprocessing and Python code of Multithreading.
Run below code as a script in cmd. For example to run a python script called basic_multiprocessing.py  , go to cmd and type below command.
python basic_multiprocessing.py

Basic: Python MultiProcessing example code

In this code I will show, how to square number (processor number) and save in a shared list in python using Manage class of multiprocessing (Appending to the same list from different processes using multiprocessing/ Multiprocessing of shared list).

# Basic: Python multiprocessing example code

from multiprocessing import Process, Manager
import os

# Function to call in Processes
def squre_number(lst, num):
lst.append(num*num)

# Start Multiprocessing (if block only for windows)
if __name__ == '__main__':

manager = Manager()
# Create a list which can be shared between processes.
shared_list = manager.list()
processes = []
# Define number of processes
# Define number of processes as number of CPUs of your machine
num_processes = os.cpu_count()

# Start Multiprocessing
for i in range(num_processes):
p = Process(target=squre_number, args=(shared_list,i,)) # Passing the list
p.start()
processes.append(p)
print('Working Processor: ',i)

for p in processes:
p.join()
# Convert a multiprocessor.manager.list to a pure python list
python_list = list(shared_list)

# Print output list
print('Output List: ', python_list)

Advanced: Python MultiProcessing example code

In basic multiprocessing code, there were no options to square specific number or pass specific number to square function. In advanced code I will show you to do that.

# Advanced: Python multiprocessing example code.
# Split a list and process sublists in different jobs/ Processes
from multiprocessing import Process, Manager
import os

# split a list into evenly sized sublists
def split_list_fun(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]

# Function to call in Processes
def squre_number(lst, item):
for num in item:
lst.append(num*num)

# Start Multiprocessing (if block only for windows)
if __name__ == '__main__':

# Define number of processes
# Define number of processes as number of CPUs of your machine
num_processes = os.cpu_count()

# List of specific numbers to square
num_list = [2,3,4,5,6,7,12,15,25,29,31,37]

# Length of provided list
total = len(num_list)
# Number of Process want to use
job_number = num_processes
# Defining in each sublist size
chunk_size = total // job_number
# Deviding list into equal sized sublists
slice_list = split_list_fun(num_list, chunk_size)

manager = Manager()
# Create a list which can be shared between processes.
shared_list = manager.list()

processes = []
# Start Multiprocessing
for i, s in enumerate(slice_list):
p = Process(target=squre_number, args=(shared_list,s,)) # Passing the list
p.start()
processes.append(p)
print('Working Processor: ',i)

for p in processes:
p.join()
# Convert a multiprocessor.manager.list to a pure python list
python_list = list(shared_list)

# Print output list
print('Output List: ', python_list)

Basic: Python MultiThreading example code

Will do same thing as basic Multiprocessing.

# Basic: Python multithreading example code

from threading import Thread

# Function to call in threads
def squre_number(lst, num):
lst.append(num*num)

# Start Multithreading (if block only for windows)
if __name__ == '__main__':

# Create a blank list which will be appended
shared_list = []
threads = []
# Define number of threads
num_threads = 10

# Start Multithreading
for i in range(num_threads):
t = Thread(target=squre_number, args=(shared_list,i,)) # Passing the list
t.start()
threads.append(t)
print('Working threads: ',i)

for t in threads:
t.join()

# Print output list
print('Output List: ', shared_list)

Advanced: Python MultiThreading example code

Will do same thing as advanced Multiprocessing.

# Advanced: Python multithreading example code.
# Split a list and process sublists in different jobs/ Threads
from threading import Thread

# split a list into evenly sized sublists
def split_list_fun(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]

# Function to call in Threads
def squre_number(lst, item):
for num in item:
lst.append(num*num)

# Start Multithreading (if block only for windows)
if __name__ == '__main__':

# Define number of threads
num_threads = 10

# List of specific numbers to square
num_list = [2,3,4,5,6,7,12,15,25,29]

# Length of provided list
total = len(num_list)
# Number of threads want to use
job_number = num_threads
# Defining in each sublist size
chunk_size = total // job_number
# Deviding list into equal sized sublists
slice_list = split_list_fun(num_list, chunk_size)

# Create a blank list to append squared numbers
shared_list = []

threads = []
# Start Multithreading
for i, s in enumerate(slice_list):
t = Thread(target=squre_number, args=(shared_list,s,)) # Passing the list
t.start()
threads.append(t)
print('Working Threads: ',i)

for t in threads:
t.join()

# Print output list
print('Output List: ', shared_list)

Now let me explain some issues I faced while writing above code.

AttributeError: Can’t get attribute Multiprocessing

While trying to run simple Multiprocessing code in Jupyter notebook, getting below error:
AttributeError: Can’t get attribute ‘square_numbers’ on <module ‘__main__’ (built-in)>
Note: square_numers is the function name of my code
To resolve this issue with Jupyter notebook on Windows I saved my function in a separate .py file and included that file in my notebook like below:
Note: Below code is for basic Multiprocessing code.

all_functions.py
# Function to call in Processes
def squre_number(lst, num):
lst.append(num*num)

Code in Jupyter notebook:

# Basic: Python multiprocessing example code 

from multiprocessing import Process, Manager
import os
# Importing function from python script
from all_functions import squre_number

# Start Multiprocessing (if block only for windows)

if __name__ == '__main__':

manager = Manager()

# Create a list which can be shared between processes.
shared_list = manager.list()
processes = []

# Define number of processes
# Define number of processes as number of CPUs of your machine
num_processes = os.cpu_count()

# Start Multiprocessing

for i in range(num_processes):
p = Process(target=squre_number, args=(shared_list,i,)) # Passing the list
p.start()
processes.append(p)
print('Working Processor: ',i)

for p in processes:
p.join()

# Convert a multiprocessor.manager.list to a pure python list
python_list = list(shared_list)

# Print output list
print('Output List: ', python_list)

ImportError: cannot import name ‘Process’

If you named your python filename as multiprocessing.py, then it will conflict with multiprocessing module and show you below error:
ImportError: cannot import name ‘Process’ from ‘multiprocessing’
To solve this issue you just need to rename your python file name to anything.

Runtime error to produce a Windows executable

If you are trying to execute Multithreading or Multiprocessing on windows system you may get below error:
RuntimeError:
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.
            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:
                if __name__ == ‘__main__’:
                    freeze_support()
                    …
            The “freeze_support()” line can be omitted if the program
            is not going to be frozen to produce a Windows executable.
For Windows, the sub processes will import (i.e. execute) the main module at starting. To resolve above issue you need to insert an if __name__ == ‘__main__’: guard in the main module to avoid creating sub processes recursively.
Note: In my all above codes if __name__ == ‘__main__’:  is inserted and working properly in windows.

Difference between Process vs Pool in multiprocessing Python

Before concluding I would like to add one point. While searching or working with Multiprocessing with Python, you may hear something called Pool in Multiprocessing. So what is the difference between Process vs Pool in Multiprocessing?

  • Process:  Hold the process which is currently under execution and at the same time schedules another process.
  • Pool:  Waits untill the current execution is completed and doesn’t schedule another process until the former is complete.

That means Pool takes more execution time than Process.

Conclusion

In this tutorial I discussed about below points:
  • What are Multi Core and Multi Thread
  • Physical core and Logical core
  • Difference between Multiprocessing and Multithreading
  • Advantage and disadvantages of Multithreading and Multiprocessing in python
  • Python code for Multithreading
  • Python code for Multiprocessing
  • How to run Multiprocessing code in Jupyter Notebook
  • Solve some other issue while working with multiprocessing and multithreading in Python


If you have any question or suggestion regarding this topic see you in comment section. I will try my best to answer.

Leave a Comment

Your email address will not be published. Required fields are marked *