Python: subprocess32 process.stdout.readline() waiting time

Question

If I run the following function "run" with for example "ls -Rlah /" I get output immediately via the print statement as expected

import subprocess32 as subprocess
def run(command):
    process = subprocess.Popen(command,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.STDOUT)
    try:
        while process.poll() == None:
            print process.stdout.readline()
    finally:
        # Handle the scenario if the parent
        # process has terminated before this subprocess
        if process.poll():
            process.kill()

However if I use the python example program below it seems to be stuck on either process.poll() or process.stdout.readline() until the program has finished. I think it is stdout.readline() since if I increase the number of strings to output from 10 to 10000 (in the example program) or add in a sys.stdout.flush() just after every print, the print in the run function does get executed.

How can I make the output from a subprocess more real-timeish?

Note: I have just discovered that the python example program does not perform a sys.stdout.flush() when it outputs, is there a way for the caller of subprocess to enforce this somehow?

Example program which outputs 10 strings every 5 seconds.

#!/bin/env python
import time

if __name__ == "__main__":

    i = 0
    start = time.time()
    while True:
        if time.time() - start >= 5:
            for _ in range(10):
                print "hello world" + str(i)
            start = time.time()
            i += 1
        if i >= 3:
            break

I still get the same result even with using that. Note I am using subprocess32 with python 2.7.10 — Har, Dec 26 '15 at 19:40
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) with the following calls to read from the process: while process.poll() == None: print process.stdout.readline() like in the example above. — Har, Dec 26 '15 at 19:43
ahhh :) instead of performing print process.stdout.readline() I performed for line in iter( process.stdout.readline, b"") but still the readline did not return until the buffer was flushed. — Har, Dec 26 '15 at 19:45
Processes tend to buffer differently depending on whether stdout is a terminal or a pipe. You are using pipes, so the child will block buffer. Try using a pseudo terminal from the `pty` module or use `pexpect` which is built for this type of thing. — tdelaney, Dec 26 '15 at 19:49
related: [Python C program subprocess hangs at “for line in iter”](http://stackoverflow.com/q/20503671/4279) — jfs, Dec 27 '15 at 06:17
related: [Python: read streaming input from subprocess.communicate()](http://stackoverflow.com/a/17698359/4279) — jfs, Dec 27 '15 at 06:18
Could you mention the source where you get the code? I see 4 issues in the code (unrelated to your problem). — jfs, Dec 27 '15 at 07:17
Its code which ive written myself :) could you please share? — Har, Dec 27 '15 at 14:42
1. `command` should be a list on POSIX (e.g., `"ls -Rlah /".split()`) 2. Use `is None` instead of `== None` 3. `print line` doubles newlines, use `print line,` (note: comma) instead, to suppress the second unnecessary newline 4. `if p.poll(): p.kill()` is wrong. See [links in the comment](http://stackoverflow.com/questions/34474554/python-subprocess32-process-stdout-readline-waiting-time/34474946#comment56697812_34474946) — jfs, Dec 28 '15 at 08:46
thanks, why would you recommend using is None as opposed to == None? — Har, Dec 28 '15 at 16:03
okay found the information here: http://stackoverflow.com/questions/3257919/is-none-vs-none thank you for pointing it out. — Har, Dec 28 '15 at 19:44

tdelaney · Answer 1 · 2015-12-27T07:18:14.590

3

On most systems, command line programs line buffer or block buffer depending on whether stdout is a terminal or a pipe. On unixy systems, the parent process can create a pseudo-terminal to get terminal-like behavior even though the child isn't really run from a terminal. You can use the pty module to create a pseudo-terminal or use the pexpect module which eases access to interactive programs.

As mentioned in comments, using poll to read lines can result in lost data. One example is data left in the stdout pipe when the process terminates. Reading pty is a bit different than pipes and you'll find you need to catch an IOError when the child closes to get it all to work properly as in the example below.

try:
    import subprocess32 as subprocess
except ImportError:
    import subprocess
import pty
import sys
import os
import time
import errno

print("running %s" % sys.argv[1])

m,s = (os.fdopen(pipe) for pipe in pty.openpty())
process = subprocess.Popen([sys.argv[1]],
                           stdin=s,
                           stdout=s,
                           stderr=subprocess.STDOUT)
s.close()

try:
    graceful = False
    while True:
        line = m.readline()
        print line.rstrip()
except IOError, e:
    if e.errno != errno.EIO:
        raise
    graceful = True
finally:
    # Handle the scenario if the parent
    # process has terminated before this subprocess
    m.close()
    if not graceful:
        process.kill()
    process.wait()

edited Dec 27 '15 at 07:18

answered Dec 26 '15 at 20:23

tdelaney

55,698
4
59
89

don't use `.poll()` ([it is unnecessary](http://stackoverflow.com/a/17698359/4279) and you may lose data at the end). See [Python subprocess readlines() hangs](http://stackoverflow.com/q/12419198/4279). Unrelated: if `process.poll()` is not `None` then the process is dead already i.e., `process.kill()` should fail. Don't use `.poll() == None`, use `.poll() is None` instead if you need to compare with `None`. – jfs Dec 27 '15 at 06:22
@J.F.Sebastian You are right! I kept with the original code as much as possible but after your comment I realized I'd better put up a more complete example. Thanks. – tdelaney Dec 27 '15 at 06:59
use `stdin=s` too, otherwise some programs may not enable interactive mode (and line-buffering). I don't like `time.sleep(.1)`, use `p.wait(0.1)` if you must (I don't see the point) instead. Magic number instead of `errno.EIO` is also not good. Also, your `os.fdopen()`-based code is not completely equivalent to [`os.read()`-based code](http://stackoverflow.com/a/12471855/4279) (there are some issues). – jfs Dec 27 '15 at 07:13
@J.F.Sebastian I'm not sure why the code to specifically kill the child is in the original code.... I got the sense that this is some mid tier process that potentially terminates on a signal from a parent process. If we are interrupted while reading the `pty` we need to kill the child before we wait. But if the pipe closed normally, we want to give the child time to exit. I'll slip in a fix. – tdelaney Dec 27 '15 at 07:14
@J.F.Sebastian yeah, `readline` and simple iteration have a fair amount of code backing them. I've had the most luck with `readline`. – tdelaney Dec 27 '15 at 07:19
I've meant that `os.fdopen()` might not work correctly with file descriptors produced by `pty` i.e., in addition to EIO on EOF there could be other differences that might lead to loss of data or a deadlock on `.readline()`. Python 2 raises OSError, not IOError here. As long as you use `os.fdopen()`, you could use a `with`-statement here (to close the files). – jfs Dec 27 '15 at 07:32

Andrea Corbellini · Accepted Answer · 2015-12-26T20:32:22.967

2

You should flush standard output in your script:

print "hello world" + str(i)
sys.stdout.flush()

When standard output is a terminal, stdout is line-buffered. But when it is not, stdout is block buffered and you need to flush it explicitly.

If you can't change the source of your script, you can use the -u option of Python (in the subprocess):

-u     Force stdin, stdout and stderr to be totally unbuffered.

Your command should be: ['python', '-u', 'script.py']

In general, this kind of buffering happens in userspace. There are no generic ways to force an application to flush its buffers: some applications support command line options (like Python), others support signals, others do not support anything.

One solution might be to emulate a pseudo terminal, giving "hints" to the programs that they should operate in line-buffered mode. Still, this is not a solution that works in every case.

edited Dec 26 '15 at 20:32

answered Dec 26 '15 at 19:45

Andrea Corbellini

15,400
2
45
63

I didnt know that without a terminal it is block buffered, usefull info, thanks! – Har Dec 26 '15 at 19:46
1

However what I am looking for is that is there a way to invoke a subprocess in non-block buffer mode? – Har Dec 26 '15 at 19:47
1

thanks yes that works! but Andrea, is there a more generic way of doing this? i.e. if the process was not python then I would be stuck. – Har Dec 26 '15 at 19:51
@Har: sorry, but unfortunately there are no ways, due to the nature of buffering – Andrea Corbellini Dec 26 '15 at 19:54
There was some documentation saying this: bufsize will be supplied as the corresponding argument to the open() function when creating the stdin/stdout/stderr pipe file objects: 0 means unbuffered (read and write are one system call and can return short) but even with 0 it does not work. – Har Dec 26 '15 at 19:54
1

@Har: that applies to the buffering done by Python in your parent process. It does (and cannot) affect subprocesses. The kind of buffering we are talking about is entirely done by the program itself or by the libraries it uses, the kernel has no role here (and therefore we cannot "transfer" a buffer size from a process to another) – Andrea Corbellini Dec 26 '15 at 19:58
Thanks I was not aware of that. – Har Dec 26 '15 at 19:59
4

Littering flushes throughout your code is rarely the right answer and its not even possible on code that you don't control. Most programs should just write stdout and let the parent control whether its line or block buffered. Except on Windows where Microsoft has never provided a good pty solution. – tdelaney Dec 26 '15 at 20:29

score 2 · Answer 3 · answered Dec 26 '15 at 20:15

For things other than python you could try using unbuffer:

unbuffer disables the output buffering that occurs when program output is redirected from non-interactive programs. For example, suppose you are watching the output from a fifo by running it through od and then more. od -c /tmp/fifo | more You will not see anything until a full page of output has been produced. You can disable this automatic buffering as follows:

unbuffer od -c /tmp/fifo | more

Normally, unbuffer does not read from stdin. This simplifies use of unbuffer in some situations. To use unbuffer in a pipeline, use the -p flag. Example: process1 | unbuffer -p process2 | process3

So in your case:

run(["unbuffer",cmd])

There are some caveats listed in the docs but it is another option.

Python: subprocess32 process.stdout.readline() waiting time

3 Answers3