I'm trying to learn how to use Python's multiprocessing
package, but I don't understand the difference between map_async
and imap
.
I noticed that both map_async
and imap
are executed asynchronously. So when should I use one over the other? And how should I retrieve the result returned by map_async
?
Should I use something like this?
def test():
result = pool.map_async()
pool.close()
pool.join()
return result.get()
result=test()
for i in result:
print i
Best Answer
There are two key differences between
imap
/imap_unordered
andmap
/map_async
:map
consumes your iterable by converting the iterable to a list (assuming it isn't a list already), breaking it into chunks, and sending those chunks to the worker processes in thePool
. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.imap
doesn't turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process. This means you don't take the memory hit of converting the whole iterable to a list, but it also means the performance is slower for large iterables, because of the lack of chunking. This can be mitigated by passing achunksize
argument larger than default of 1, however.The other major difference between
imap
/imap_unordered
andmap
/map_async
, is that withimap
/imap_unordered
, you can start receiving results from workers as soon as they're ready, rather than having to wait for all of them to be finished. Withmap_async
, anAsyncResult
is returned right away, but you can't actually retrieve results from that object until all of them have been processed, at which points it returns the same list thatmap
does (map
is actually implemented internally asmap_async(...).get()
). There's no way to get partial results; you either have the entire result, or nothing.imap
andimap_unordered
both return iterables right away. Withimap
, the results will be yielded from the iterable as soon as they're ready, while still preserving the ordering of the input iterable. Withimap_unordered
, results will be yielded as soon as they're ready, regardless of the order of the input iterable. So, say you have this:This will output:
If you use
p.imap_unordered
instead ofp.imap
, you'll see:If you use
p.map
orp.map_async().get()
, you'll see:So, the primary reasons to use
imap
/imap_unordered
overmap_async
are: