Python 101 - Creating Multiple Processes

Most CPU manufacturers are creating multi-core CPUs now. Even cell phones come with multiple cores! Python threads can't use those cores because of the Global Interpreter Lock. Starting in Python 2.6, the multiprocessing module was added which lets you take full advantage of all the cores on your machine.

In this article, you will learn about the following topics:

  • Pros of Using Processes
  • Cons of Using Processes
  • Creating Processes with multiprocessing
  • Subclassing Process
  • Creating a Process Pool

This article is not a comprehensive overview of multiprocessing. The topic of multiprocessing and concurrency in general would be better suited in a book of its own. You can always check out the documentation for the multiprocessing module if you need to here:

Now, let's get started!

Pros of Using Processes

There are several pros to using processes:

  • Processes use separate memory space
  • Code can be more straight forward compared to threads
  • Uses multiple CPUs / cores
  • Avoids the Global Interpreter Lock (GIL)
  • Child processes can be killed (unlike threads)
  • The multiprocessing module has an interface similar to threading.Thread
  • Good for CPU-bound processing (encryption, binary search, matrix multiplication)

Now let's look at some of the cons of processes!

Cons of Using Processes

There are also a couple of cons to using processes:

  • Interprocess communication is more complicated
  • Memory footprint is larger than threads

Now let's learn how to create a process with Python!

Creating Processes with multiprocessing

The multiprocessing module was designed to mimic how the threading.Thread class worked.

Here is an example of using the multiprocessing module:

import multiprocessing
import random
import time


def worker(name: str) -> None:
    print(f'Started worker {name}')
    worker_time = random.choice(range(1, 5))
    time.sleep(worker_time)
    print(f'{name} worker finished in {worker_time} seconds')

if __name__ == '__main__':
    processes = []
    for i in range(5):
        process = multiprocessing.Process(target=worker, 
                                          args=(f'computer_{i}',))
        processes.append(process)
        process.start()
        
    for proc in processes:
        proc.join()

The first step is to import the multiprocessing module. The other two imports are for the random and time modules respectively.

Then you have the silly worker() function that pretends to do some work. It takes in a name and returns nothing. Inside the worker() function, it will print out the name of the worker, then it will use time.sleep() to simulate doing some long-running process. Finally, it will print out that it has finished.

The last part of the code snippet is where you create 5 worker processes. You use multiprocessing.Process(), which works pretty much the same way as threading.Thread() did. You tell Process what target function to use and what arguments to pass to it. The main difference is that this time you are creating a list of processes. For each process, you call its start() method to start the process.

Then at the end, you loop over the list of processes and call its join() method, which tells Python to wait for the process to terminate.

When you run this code, you will see output that is similar to the following:

Started worker computer_2
computer_2 worker finished in 2 seconds
Started worker computer_1
computer_1 worker finished in 3 seconds
Started worker computer_3
computer_3 worker finished in 3 seconds
Started worker computer_0
computer_0 worker finished in 4 seconds
Started worker computer_4
computer_4 worker finished in 4 seconds

Each time you run your script, the output will be a little different because of the random module. Give it a try and see for yourself!

Subclassing Process

The Process class from the multiprocessing module can also be subclassed. It works in much the same way as the threading.Thread class does.

Let's take a look:

# worker_thread_subclass.py

import random
import multiprocessing
import time

class WorkerProcess(multiprocessing.Process):

    def __init__(self, name):
        multiprocessing.Process.__init__(self)
        self.name = name

    def run(self):
        """
        Run the thread
        """
        worker(self.name)

def worker(name: str) -> None:
    print(f'Started worker {name}')
    worker_time = random.choice(range(1, 5))
    time.sleep(worker_time)
    print(f'{name} worker finished in {worker_time} seconds')

if __name__ == '__main__':
    processes = []
    for i in range(5):
        process = WorkerProcess(name=f'computer_{i}')
        processes.append(process)
        process.start()

    for process in processes:
        process.join()

Here you subclassmultiprocess.Process() and override itsrun() method.

Next, you create the processes in a loop at the end of the code and add it to a process list. Then to get the processes to work properly, you need to loop over the list of processes and call join() on each of them. This works exactly as it did in the previous process example from the last section.

The output from this class should also be quite similar to the output from the previous section.

Creating a Process Pool

If you have a lot of processes to run, sometime you will want to limit the number of processes that can run at once. For example, let's say you need to run 20 processes but you have a processor with only 4 cores. You can use the multiprocessing module to create a process pool that will limit the number of processes running to only 4 at a time.

Here's how you can do it:

import random
import time

from multiprocessing import Pool


def worker(name: str) -> None:
    print(f'Started worker {name}')
    worker_time = random.choice(range(1, 5))
    time.sleep(worker_time)
    print(f'{name} worker finished in {worker_time} seconds')

if __name__ == '__main__':
    process_names = [f'computer_{i}' for i in range(15)]
    pool = Pool(processes=5)
    pool.map(worker, process_names)
    pool.terminate()

In this example, you have the same worker() function. The real meat of the code is at the end where you create 15 process names using a list comprehension. Then you create a Pool and set the total number of processes to run at once to 5. To use the pool, you need to call the map() method and pass it the function you wish to call along with the arguments to pass to the function.

Python will now run 5 processes (or less) at a time until all the processes have finished. You need to call terminate() on the pool at the end or you will see a message like this:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: 
UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown

Now you know how to create a process Pool with Python!

Wrapping Up

You have now learned the basics of using the multiprocessing module. You have learned the following:

  • Pros of Using Processes
  • Cons of Using Processes
  • Creating Processes with multiprocessing
  • Subclassing Process
  • Creating a Process Pool

There is much more to multiprocessing than what is covered here. You could learn how to use Python's Queue module to get output from processes. There is the topic of interprocess communication. And there's much more too. However the objective was to learn how to create processes, not learn every nuance of the multiprocessing module. Concurrency is a large topic that would need much more in-depth coverage than what can be covered in this article.

Copyright © 2024 Mouse Vs Python | Powered by Pythonlibrary