14

I'm running a script via Python's subprocess module. Currently I use:

p = subprocess.Popen('/path/to/script', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
result = p.communicate()

I then print the result to the stdout. This is all fine but as the script takes a long time to complete, I wanted real time output from the script to stdout as well. The reason I pipe the output is because I want to parse it.

AsadSMalik
  • 739
  • 1
  • 13
  • 36
  • related: [Python: read streaming input from `subprocess.communicate()`](http://stackoverflow.com/q/2715847/4279) – jfs Sep 09 '14 at 23:03
  • related: [Python subprocess get children's output to file and terminal?](http://stackoverflow.com/q/4984428/4279) – jfs Sep 09 '14 at 23:06
  • You might try using subprocess.call( ['/path/to/script'] ) if you don't need access to all the lower-level options of Popen. Output should stream to stdout by default. – Lukeclh Sep 10 '14 at 01:46
  • @Lukeclh: `call('/path/to/script')` will show the output but you won't be able to capture it at the same time (to parse it later as OP asks). – jfs Sep 17 '14 at 01:14
  • related: [Subprocess.Popen: cloning stdout and stderr both to terminal and variables](http://stackoverflow.com/q/17190221/4279) – jfs Sep 21 '14 at 15:50

4 Answers4

16

To save subprocess' stdout to a variable for further processing and to display it while the child process is running as it arrives:

#!/usr/bin/env python3
from io import StringIO
from subprocess import Popen, PIPE

with Popen('/path/to/script', stdout=PIPE, bufsize=1,
           universal_newlines=True) as p, StringIO() as buf:
    for line in p.stdout:
        print(line, end='')
        buf.write(line)
    output = buf.getvalue()
rc = p.returncode

To save both subprocess's stdout and stderr is more complex because you should consume both streams concurrently to avoid a deadlock:

stdout_buf, stderr_buf = StringIO(), StringIO()
rc =  teed_call('/path/to/script', stdout=stdout_buf, stderr=stderr_buf,
                universal_newlines=True)
output = stdout_buf.getvalue()
...

where teed_call() is define here.


Update: here's a simpler asyncio version.


Old version:

Here's a single-threaded solution based on child_process.py example from tulip:

import asyncio
import sys
from asyncio.subprocess import PIPE

@asyncio.coroutine
def read_and_display(*cmd):
    """Read cmd's stdout, stderr while displaying them as they arrive."""
    # start process
    process = yield from asyncio.create_subprocess_exec(*cmd,
            stdout=PIPE, stderr=PIPE)

    # read child's stdout/stderr concurrently
    stdout, stderr = [], [] # stderr, stdout buffers
    tasks = {
        asyncio.Task(process.stdout.readline()): (
            stdout, process.stdout, sys.stdout.buffer),
        asyncio.Task(process.stderr.readline()): (
            stderr, process.stderr, sys.stderr.buffer)}
    while tasks:
        done, pending = yield from asyncio.wait(tasks,
                return_when=asyncio.FIRST_COMPLETED)
        assert done
        for future in done:
            buf, stream, display = tasks.pop(future)
            line = future.result()
            if line: # not EOF
                buf.append(line)    # save for later
                display.write(line) # display in terminal
                # schedule to read the next line
                tasks[asyncio.Task(stream.readline())] = buf, stream, display

    # wait for the process to exit
    rc = yield from process.wait()
    return rc, b''.join(stdout), b''.join(stderr)

The script runs '/path/to/script command and reads line by line both its stdout&stderr concurrently. The lines are printed to parent's stdout/stderr correspondingly and saved as bytestrings for future processing. To run the read_and_display() coroutine, we need an event loop:

import os

if os.name == 'nt':
    loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows
    asyncio.set_event_loop(loop)
else:
    loop = asyncio.get_event_loop()
try:
    rc, *output = loop.run_until_complete(read_and_display("/path/to/script"))
    if rc:
        sys.exit("child failed with '{}' exit code".format(rc))
finally:
    loop.close()
Community
  • 1
  • 1
jfs
  • 346,887
  • 152
  • 868
  • 1,518
1

p.communicate() waits for the subprocess to complete and then returns its entire output at once.

Have you tried something like this instead, where you read the subprocess output line-by-line?

p = subprocess.Popen('/path/to/script', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in p.stdout:
  # do something with this individual line
  print line
Dan Lenski
  • 67,458
  • 11
  • 63
  • 115
  • 2
    if the child process generates enough output to fill OS stderr pipe buffer (65K on my machine) then it hangs. You should consume `p.stderr` too -- concurrently. Due to the read-ahead bug, `for line in p.stdout` will print in bursts. You could use `for line in iter(p.stdout.readline, b'')` instead. `print line` will print double newlines. You could use `print line,` (note: comma), to avoid it. – jfs Sep 09 '14 at 23:04
  • Great point about consuming `stderr` too. I was assuming that a few lines of buffering wouldn't be an issue in a lengthy data stream, but that's something to consider as well. – Dan Lenski Sep 09 '14 at 23:25
  • 1
    *"the script takes a long time to complete"* -- it means that if the script writes progress to stderr then it *can* stall. – jfs Sep 09 '14 at 23:27
0

The Popen.communicate doc clearly states:

Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate

So if you need realtime output, you need to use something like this:

stream_p = subprocess.Popen('/path/to/script', stdout=subprocess.PIPE, stderr=subprocess.PIPE)

while stream_line in stream_p:
    #Parse it the way you want
    print stream_line
doubleo
  • 3,771
  • 4
  • 13
  • 18
0

This prints both stdout and stderr to the terminal as well as saving both stdout and stderr into a variable:

from subprocess import Popen, PIPE, STDOUT

with Popen(args, stdout=PIPE, stderr=STDOUT, text=True, bufsize=1) as p:
    output = "".join([print(buf, end="") or buf for buf in p.stdout])

However, depending on what exactly you're doing, this might be important to note: By using stderr=STDOUT, we cannot differentiate between stdout and stderr anymore and with the call to print, your output will always be printed to stdout, doesn't matter if it came from stdout or stderr.

For Python < 3.7 you will need to use universal_newlines instead of text.

New in version 3.7: text was added as a more readable alias for universal_newlines.

Source: https://docs.python.org/3/library/subprocess.html#subprocess.Popen

finefoot
  • 7,307
  • 6
  • 35
  • 69