0%

Get the Number of Running Processes in a Python Pool

When writing Python programs, we often use the multiprocessing module to fully utilize CPU resources. If there are many tasks, we may create a process pool and submit work asynchronously.

In many cases, the total number of tasks is known in advance, so we can queue everything at once. But if multiple servers are processing a dynamic workload, it can be useful to periodically check how many worker processes are still busy so that new tasks can be scheduled more intelligently.

The official Python documentation does not provide a public method for getting the number of currently running tasks in a Pool. However, after inspecting the pool object, I found that the _cache attribute reflects the active jobs in the pool. Its length can be used as a rough count of running tasks.

Here is a simple example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/usr/bin/env python
# -*- coding:utf8 -*-

import time
from multiprocessing import Pool


def test_sleep(sleep_time):
time.sleep(sleep_time)
return 0


pool = Pool(processes=3)
pool.apply_async(test_sleep, (3,))
pool.apply_async(test_sleep, (1,))
pool.apply_async(test_sleep, (5,))

while True:
time.sleep(1)
process_count = len(pool._cache)
if process_count != 0:
print("%s processes running" % len(pool._cache))
else:
print("all processes are finished")
break

pool.close()
pool.join()

In this example, test_sleep simply sleeps for the given number of seconds. After creating a pool with capacity 3, three asynchronous tasks are submitted. The script then checks once per second how many tasks are still present in the pool cache, and exits when everything has finished.

The output looks like this:

1
2
3
4
5
6
knktc-rbmp:tmp knktc$ python test_pool.py
3 processes running
2 processes running
1 processes running
1 processes running
all processes are finished

This is enough to monitor the current workload and decide when to add more tasks dynamically.

如果我的文字帮到了您,那么可不可以请我喝罐可乐?