Why is a Python I/O bound task not blocked by the GIL?

Question

The python threading documentation states that "...threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously", apparently because I/O-bound processes can avoid the GIL that prevents threads from concurrent execution in CPU-bound tasks.

But what I dont understand is that an I/O task still uses the CPU. So how could it not encounter the same issues? Is it because the I/O bound task will not require memory management?

" I/O task still uses the CPU" ... in general, this is typically not the case. Instead of PIO, many/most modern peripherals have direct memory access not requiring CPU cycles to move data. — Brian Cain, Mar 26 '15 at 04:01

score 43 · Answer 1 · answered Mar 26 '15 at 03:55

43

All of Python's blocking I/O primitives release the GIL while waiting for the I/O block to resolve -- it's as simple as that! They will of course need to acquire the GIL again before going on to execute further Python code, but for the long-in-terms-of-machine-cycles intervals in which they're just waiting for some I/O syscall, they don't need the GIL, so they don't hold on to it!

answered Mar 26 '15 at 03:55

Alex Martelli

762,786
156
1,160
1,345

3

@Shashank, nah -- I've got benchmarks of "naturally coded" Python vs C++ I/O bound code with Python absolutely running circles around C++ -- two, three times faster. Dropping the GIL is always super-fast and acquiring it again pretty fast unless you have more threads active that's good for you! – Alex Martelli Mar 26 '15 at 04:30
(With IO the biggest bottlenecks I've run into are usually not suitable buffers and repeated "each byte" system-IO calls. Pretty much universal across languages; some hide it better / by default.) – user2864740 Mar 26 '15 at 04:35
@Shashank I'm not sure that's is related to the GIL so much as the output buffering. There is an implicit 'flush' at the end of each print (thanks to the newline) when in 'interactive mode' - at the very least a more fair comparison would write directly to a non-interactive stream. (Writing many lines to the console is relatively slow, even in "plain C".) See http://stackoverflow.com/questions/1716296 – user2864740 Mar 26 '15 at 04:47
@user2864740 You are right, there is a a huge boost in performance in the first method when writing to a non-interactive stream. However, the second way is still consistently faster by a noticable margin. – Shashank Mar 26 '15 at 04:52
@Shashank That I can believe for a simple case, as 1 "context switch" is faster than N - and *assuming* no benefit from concurrency. I haven't been bit by such though for my work (read: if there is a difference I've never had reason to worry about it), when watching the buffering. – user2864740 Mar 26 '15 at 04:54
@user2864740 To further complicate things, there's actually also an optimal buffer size. Because if you send a string that is *too* large to an output stream, it's actually slower than breaking it up into smaller strings and sending those separately. I believe it's because of the way Python handles Strings in memory but I'm not 100 percent sure. – Shashank Mar 26 '15 at 05:26
@Shashank I've not played around with such, but it does not surprise me if there are all sorts of fun combinations one way or the other. I generally only chase fallout of the program "being too slow" .. usually I can blame it on something (or someone) else : – user2864740 Mar 26 '15 at 05:44
So it's regardless of whether the IO is sync/blocking or async/non-blocking. It just simply release the GIL prior to IO operation. Then gets it back. In the meantime, thread calling the operation might get blocked or simply return depending on the IO operation type. – stdout Mar 06 '17 at 22:43
Is waiting for threads to complete consider an IO operation where GIL will be released? When a thread calls `concurrent.futures.wait` on a set of futures it launched, does the thread releases GIL so that other threads can run? – nishant Jan 31 '20 at 08:26
@nishant If the GIL were not released when waiting, wouldn't that almost always result in a deadlock? – cosmicFluke Oct 23 '20 at 06:14

user2864740 · Accepted Answer · 2015-03-26T06:48:26.600

The GIL in CPython¹ is only concerned with Python code being executed. A thread-safe C extension that uses a lot of CPU might release the GIL as long as it doesn't need to interact with the Python runtime.

As soon as the C code needs to 'talk' to Python (read: call back into the Python runtime) then it needs to acquire the GIL again - that is, the GIL is to establish protection/atomic behavior for the "interpreter" (and I use the term loosely) and is not to prevent native/non-Python code from running concurrently.

Releasing the GIL around I/O (blocking or not, using CPU or not) is the same thing - until the data is moved into Python there is no reason to acquire the GIL.

¹ The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

Why is a Python I/O bound task not blocked by the GIL?

2 Answers2

Linked

Related