Streaming read from subprocess

Question

I need to read output from a child process as it's produced -- perhaps not on every write, but well before the process completes. I've tried solutions from the Python3 docs and SO questions here and here, but I still get nothing until the child terminates.

The application is for monitoring training of a deep learning model. I need to grab the test output (about 250 bytes for each iteration, at roughly 1-minute intervals) and watch for statistical failures.

I cannot change the training engine; for instance, I cannot insert stdout.flush() in the child process code.
I can reasonably wait for a dozen lines of output to accumulate; I was hopeful of a buffer-fill solving my problem.

Code: variations are commented out.

Parent

cmd = ["/usr/bin/python3", "zzz.py"]
# test_proc = subprocess.Popen(
test_proc = subprocess.run(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
    )

out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
    out_data = test_proc.stdout
    # out_data, err_data = test_proc.communicate()
    print(time.time(), "MAIN received", out_data)

Child (zzz.py)

from time import sleep
import sys

for _ in range(5):
    print(_, "sleeping", "."*1000)
    # sys.stdout.flush()
    sleep(1)

print("QUIT this exercise")

Despite sending lines of 1000+ bytes, the buffer (tested elsewhere as 2kb; here, I've gone as high as 50kb) filling doesn't cause the parent to "see" the new text.

What am I missing to get this to work?

Update with regard to links, comments, and iBug's posted answer:

Popen instead of run fixed the blocking issue. Somehow I missed this in the documentation and my experiments with both.
universal_newline=True neatly changed the bytes return to string: easier to handle on the receiving end, although with interleaved empty lines (easy to detect and discard).
Setting bufsize to something tiny (e.g. 1) didn't affect anything; the parent still has to wait for the child to fill the stdout buffer, 8k in my case.
export PYTHONUNBUFFERED=1 before execution did fix the buffering problem. Thanks to wim for the link.

Unless someone comes up with a canonical, nifty solution that makes these obsolete, I'll accept iBug's answer tomorrow.

Does it have to work with reading from a stream? What framework are you using? E.g. in TF/Keras you can pass callbacks that can handle all the reporting you want to see; maybe this could be a good alternative. — Kostas Mouratidis, Jan 02 '19 at 17:22
I needed to do this once - did you try jfs solution from [here](https://stackoverflow.com/a/17698359/674039)? (not the accepted answer) — wim, Jan 02 '19 at 17:26
Related: [Getting realtime output using subprocess](https://stackoverflow.com/q/803265/674039) — wim, Jan 02 '19 at 17:29
@KostasMouratidis: I'm well aware of Keras; it's not part of the test environment. This is TF only. I believe I can insert code into the iteration loop, but this doesn't scale for our purposes. — Prune, Jan 02 '19 at 18:26
Thanks for the pointers and solutions; will try them over the next few hours. The Real Day interferes right now ... — Prune, Jan 02 '19 at 18:26

iBug · Accepted Answer · 2019-01-03T04:44:16.560

subprocess.run always spawns the child process, and blocks the thread until it exits.

The only option for you is to use p = subprocess.Popen(...) and read lines with s = p.stdout.readline() or p.stdout.__iter__() (see below).

This code works for me, if the child process flushes stdout after printing a line (see below for extended note).

cmd = ["/usr/bin/python3", "zzz.py"]
test_proc = subprocess.Popen(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)

out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
    out_data = test_proc.stdout.readline()
    print(time.time(), "MAIN received", out_data)
test_proc.communicate()  # shut it down

See my terminal log (dots removed from zzz.py):

ibug@ubuntu:~/t $ python3 p.py
1546450821.9174328 START
1546450821.9793346 MAIN received b'0 sleeping \n'
1546450822.987753 MAIN received b'1 sleeping \n'
1546450823.993136 MAIN received b'2 sleeping \n'
1546450824.997726 MAIN received b'3 sleeping \n'
1546450825.9975247 MAIN received b'4 sleeping \n'
1546450827.0094354 MAIN received b'QUIT this exercise\n'

You can also do it with a for loop:

for out_data in test_proc.stdout:
    if "QUIT" in str(out_data):
        break
    print(time.time(), "MAIN received", out_data)

If you cannot modify the child process, unbuffer (from package expect - install with APT or YUM) may help. This is my working parent code without changing the child code.

test_proc = subprocess.Popen(
    ["unbuffer"] + cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)

Not quite; I cannot change the child process, even to inserting a `flush`. I'll try this by waiting for a buffer fill ("natural" flush), as I've done with other tests. On the other hand, that `Popen` looks hopeful, too -- my attempts got me to the same dead end. — Prune, Jan 02 '19 at 22:08
Your last paragraph doesn't match the functionality I see. I do get the string return, but it comes in bursts of roughly 8k, rather than one read per second. I did try bufsize=1 earlier, and it seems to have no effect on the "burst size" that the parent process receives. — Prune, Jan 02 '19 at 22:34
Thanks; whew! My main worry was having the functionality out of synch. — Prune, Jan 03 '19 at 17:10
`unbuffer` would be nice, but I'll have to do this with the PYTHONUNBUFFERED environment variable. Application characteristics make it difficult to blithely add another module. — Prune, Jan 03 '19 at 17:33

Streaming read from subprocess

1 Answers1

Linked