The major advantage of Python parallelization is that it runs the code faster and makes better use of CPU resources. This is because parallel computing simultaneously uses multiple computer resources to address a computational problem.
How parallelization in Python works?
Step 1: First, a problem is broken into discrete parts that can be solved concurrently
Step 2: Each part is further broken down into a series of instructions
Step 3: Instructions from each part are executed simultaneously on different processors
Python Global Interpreter Lock (GIL)
The GIL is a Python process lock. As the name suggests, it ‘locks’ something from happening. Something in this context refers to multi-threading. All Python processes must go through the GIL to execute – allowing one thread execution at a time.
GIL is infamous and has disadvantages only when executing processor-intensive work in Python alone. However, not all versions of Python use Global Interpreter Lock (GIL). Scenarios where GIL is faster included – all single-threaded cases, multi-threaded cases but only for I/O bound programs, and multi-threaded cases for CPU-bound programs that executive compute-intensive work in C libraries.
Application for parallel programming in Python and its pros
If you ask what is the need for parallelizing your Python code? We, without a second thought, will reply that it reduces the total time and increases efficiency. For instance, let’s say there are multiple tasks to be performed and you select a “for loop.” Parallelizing here enables you to perform each independent task by a different processor – thus reducing time and increasing efficiency. This way, you can launch several instances of an application or a script to perform simultaneously.
Pros of parallelization
- Reduces total time
- Increases efficiency
- Minimal wastage of available resources
Types of Parallelization
There are two choices you can have with parallel programming in Python: multi-threading and multi-processing.
Multithreading
- Uses multiple threads of a processor (I/O bound)
- In python, there is a library of concurrent.futures, where they can create a pool of threads to execute on a processor
- The running threads get scheduled by the scheduler for execution
Multiprocessing
- Uses multiple processors (CPU bound)
- There is library multiprocessing, which allows you to create multiple processors to run on multiple cores of the system at a time
Time comparison on reading dataset
In the example below, a standard library took about 4 seconds of CPU time to read a dataset with 1,223,009 records in 10 attributes. While a parallel library took just 17 milliseconds to read the same dataset.
Time comparison on defined or customized function
This function calls another function for computation.
With parallelized program, it took about 14.013 seconds to complete. While without parallelized program, it took about 45.0238 seconds in 8 cores.
We hope the above examples help you get a better understanding of time comparison with and without parallel programming in Python. Let us know your thoughts on this blog through the comments section below.