Most CPU manufacturers are creating multi-core CPUs now. Even cell phones come with multiple cores! Python threads can't use those cores because of the Global Interpreter Lock. Starting in Python 2.6, the multiprocessing
module was added which lets you take full advantage of all the cores on your machine.
In this article, you will learn about the following topics:
multiprocessing
Process
This article is not a comprehensive overview of multiprocessing. The topic of multiprocessing and concurrency in general would be better suited in a book of its own. You can always check out the documentation for the multiprocessing
module if you need to here:
Now, let's get started!
There are several pros to using processes:
multiprocessing
module has an interface similar to threading.Thread
Now let's look at some of the cons of processes!
There are also a couple of cons to using processes:
Now let's learn how to create a process with Python!
multiprocessing
The multiprocessing
module was designed to mimic how the threading.Thread
class worked.
Here is an example of using the multiprocessing
module:
import multiprocessing import random import time def worker(name: str) -> None: print(f'Started worker {name}') worker_time = random.choice(range(1, 5)) time.sleep(worker_time) print(f'{name} worker finished in {worker_time} seconds') if __name__ == '__main__': processes = [] for i in range(5): process = multiprocessing.Process(target=worker, args=(f'computer_{i}',)) processes.append(process) process.start() for proc in processes: proc.join()
The first step is to import the multiprocessing
module. The other two imports are for the random
and time
modules respectively.
Then you have the silly worker()
function that pretends to do some work. It takes in a name
and returns nothing. Inside the worker()
function, it will print out the name
of the worker, then it will use time.sleep()
to simulate doing some long-running process. Finally, it will print out that it has finished.
The last part of the code snippet is where you create 5 worker processes. You use multiprocessing.Process()
, which works pretty much the same way as threading.Thread()
did. You tell Process
what target function to use and what arguments to pass to it. The main difference is that this time you are creating a list
of processes. For each process, you call its start()
method to start the process.
Then at the end, you loop over the list of processes and call its join()
method, which tells Python to wait for the process to terminate.
When you run this code, you will see output that is similar to the following:
Started worker computer_2 computer_2 worker finished in 2 seconds Started worker computer_1 computer_1 worker finished in 3 seconds Started worker computer_3 computer_3 worker finished in 3 seconds Started worker computer_0 computer_0 worker finished in 4 seconds Started worker computer_4 computer_4 worker finished in 4 seconds
Each time you run your script, the output will be a little different because of the random
module. Give it a try and see for yourself!
Process
The Process
class from the multiprocessing
module can also be subclassed. It works in much the same way as the threading.Thread
class does.
Let's take a look:
# worker_thread_subclass.py import random import multiprocessing import time class WorkerProcess(multiprocessing.Process): def __init__(self, name): multiprocessing.Process.__init__(self) self.name = name def run(self): """ Run the thread """ worker(self.name) def worker(name: str) -> None: print(f'Started worker {name}') worker_time = random.choice(range(1, 5)) time.sleep(worker_time) print(f'{name} worker finished in {worker_time} seconds') if __name__ == '__main__': processes = [] for i in range(5): process = WorkerProcess(name=f'computer_{i}') processes.append(process) process.start() for process in processes: process.join()
Here you subclassmultiprocess.Process()
and override itsrun()
method.
Next, you create the processes in a loop at the end of the code and add it to a process list. Then to get the processes to work properly, you need to loop over the list of processes and call join()
on each of them. This works exactly as it did in the previous process example from the last section.
The output from this class should also be quite similar to the output from the previous section.
If you have a lot of processes to run, sometime you will want to limit the number of processes that can run at once. For example, let's say you need to run 20 processes but you have a processor with only 4 cores. You can use the multiprocessing
module to create a process pool that will limit the number of processes running to only 4 at a time.
Here's how you can do it:
import random import time from multiprocessing import Pool def worker(name: str) -> None: print(f'Started worker {name}') worker_time = random.choice(range(1, 5)) time.sleep(worker_time) print(f'{name} worker finished in {worker_time} seconds') if __name__ == '__main__': process_names = [f'computer_{i}' for i in range(15)] pool = Pool(processes=5) pool.map(worker, process_names) pool.terminate()
In this example, you have the same worker()
function. The real meat of the code is at the end where you create 15 process names using a list comprehension. Then you create a Pool
and set the total number of processes to run at once to 5. To use the pool
, you need to call the map()
method and pass it the function you wish to call along with the arguments to pass to the function.
Python will now run 5 processes (or less) at a time until all the processes have finished. You need to call terminate()
on the pool at the end or you will see a message like this:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
Now you know how to create a process Pool
with Python!
You have now learned the basics of using the multiprocessing
module. You have learned the following:
multiprocessing
Process
There is much more to multiprocessing
than what is covered here. You could learn how to use Python's Queue
module to get output from processes. There is the topic of interprocess communication. And there's much more too. However the objective was to learn how to create processes, not learn every nuance of the multiprocessing
module. Concurrency is a large topic that would need much more in-depth coverage than what can be covered in this article.
Copyright © 2024 Mouse Vs Python | Powered by Pythonlibrary