254

My python script uses subprocess to call a linux utility that is very noisy. I want to store all of the output to a log file and show some of it to the user. I thought the following would work, but the output doesn't show up in my application until the utility has produced a significant amount of output.

#fake_utility.py, just generates lots of output over time
import time
i = 0
while True:
   print hex(i)*512
   i += 1
   time.sleep(0.5)

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
for line in proc.stdout:
   #the real code does filtering here
   print "test:", line.rstrip()

The behavior I really want is for the filter script to print each line as it is received from the subprocess. Sorta like what tee does but with python code.

What am I missing? Is this even possible?


Update:

If a sys.stdout.flush() is added to fake_utility.py, the code has the desired behavior in python 3.1. I'm using python 2.6. You would think that using proc.stdout.xreadlines() would work the same as py3k, but it doesn't.


Update 2:

Here is the minimal working code.

#fake_utility.py, just generates lots of output over time
import sys, time
for i in range(10):
   print i
   sys.stdout.flush()
   time.sleep(0.5)

#display out put line by line
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
#works in python 3.0+
#for line in proc.stdout:
for line in iter(proc.stdout.readline,''):
   print line.rstrip()
deft_code
  • 51,579
  • 27
  • 135
  • 215
  • 5
    you could use `print line,` instead of `print line.rstrip()` (note: comma at the end). – jfs Jan 23 '12 at 11:14
  • related: [Python: read streaming input from `subprocess.communicate()`](http://stackoverflow.com/q/2715847/4279) – jfs Sep 09 '14 at 23:03
  • 2
    Update 2 states that it works with python 3.0+ but uses the old print statement, so it does not work with python 3.0+. – Rooky Dec 19 '16 at 21:02
  • None of the answers listed here worked for me, but https://stackoverflow.com/questions/5411780/python-run-a-daemon-sub-process-read-stdout/5413588#5413588 did! – boxed Nov 11 '18 at 08:48
  • interesting the code that only works in python3.0+ uses 2.7 syntax for print. – thang Sep 16 '20 at 22:31
  • the update does not work. you're only printing line by line, not receiving them one by one. – Vaidøtas I. Feb 27 '21 at 22:02

9 Answers9

194

It's been a long time since I last worked with Python, but I think the problem is with the statement for line in proc.stdout, which reads the entire input before iterating over it. The solution is to use readline() instead:

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
while True:
  line = proc.stdout.readline()
  if not line:
    break
  #the real code does filtering here
  print "test:", line.rstrip()

Of course you still have to deal with the subprocess' buffering.

Note: according to the documentation the solution with an iterator should be equivalent to using readline(), except for the read-ahead buffer, but (or exactly because of this) the proposed change did produce different results for me (Python 2.5 on Windows XP).

Rômulo Ceccon
  • 9,535
  • 4
  • 36
  • 47
  • 11
    for `file.readline()` vs. `for line in file` see http://bugs.python.org/issue3907 (in short: it works on Python3; use `io.open()` on Python 2.6+) – jfs Jan 23 '12 at 11:16
  • 5
    The more pythonic test for an EOF, per the "Programming Recommendations" in PEP 8 (http://www.python.org/dev/peps/pep-0008/), would be 'if not line:'. – Jason Mock Nov 13 '12 at 15:20
  • there is no `open()` used in this script; where do you put `io.open()`? is there a workaround for 2.5? – n611x007 Nov 14 '12 at 14:06
  • 14
    @naxa: for pipes: `for line in iter(proc.stdout.readline, ''):`. – jfs Nov 14 '12 at 18:22
  • @J.F.Sebastian: did you try this solution on Python3? I have code that previously ran on Python 2(.7) using the `iter(proc.stdout.readline, '')` approach, and now that I switched to Python 3.4 that code went pear-shaped, the loop does not return and RAM usage oscillates between ~0 and 3 GB. – Dr. Jan-Philip Gehrcke Feb 22 '15 at 19:37
  • 3
    @Jan-PhilipGehrcke: yes. 1. you could use `for line in proc.stdout` on Python 3 (there is no the read-ahead bug) 2. `'' != b''` on Python 3 -- don't copy-paste the code blindly -- think what it does and how it works. – jfs Feb 23 '15 at 02:25
  • 2
    @J.F.Sebastian: sure, the `iter(f.readline, b'')` solution is rather obvious (and also works on Python 2, if anyone is interested). The point of my comment was not to blame your solution (sorry if it appeared like that, I read that now, too!), but to describe the extent of the symptoms, which are quite severe in this case (most of the Py2/3 issues result in exceptions, whereas here a well-behaved loop changed to be endless, and garbage collection struggles fighting the flood of newly created objects, yielding memory usage oscillations with long period and large amplitude). – Dr. Jan-Philip Gehrcke Feb 23 '15 at 13:04
  • @Jan-PhilipGehrcke: whether to use `''` or `b''` depends on `universal_newlines` parameter that enables text mode. It is not obvious. There are parameters that are different on Python 2 and 3. You should be careful if you write single source Python 2/3 compatible code that uses `subprocess` module. – jfs Feb 23 '15 at 13:21
  • @J.F.Sebastian: I agree that there is a lot to consider when using `subprocess`, but usage of `b''` fits *most* application scenarios, because the well-chosen default in both, Python 2 and 3 is to treat `subprocess.PIPE` as a byte stream, and to not implicitly perform de/encoding operations. I'd say `b''` is recommendable even on Python 2, because it is semantically better (explicit). Indeed, `b''` would be wrong with `universal_newlines=True` on Python 3 (which renders `stdout/err` attributes to be `TextIOWrapper` objects). On Python 2, `b''` works independent of `universal_newlines`. – Dr. Jan-Philip Gehrcke Feb 23 '15 at 14:26
  • How can you see if `proc` has terminated before trying to read another line from its stdout? – HelloGoodbye Jul 07 '16 at 11:16
  • Does this care how frequently or infrequently the called process sends output? Could it run indefinitely for months only printing a line every 30 seconds? I don't understand how `readline()` can determine when the program output is actually finished... – Will Jul 09 '16 at 01:07
  • 3
    I recommmend to add `sys.stdout.flush()` before breaking, otherwise things mix up. – Dawid Gosławski Mar 15 '18 at 09:52
66

Bit late to the party, but was surprised not to see what I think is the simplest solution here:

import io
import subprocess

proc = subprocess.Popen(["prog", "arg"], stdout=subprocess.PIPE)
for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"):  # or another encoding
    # do something with line

(This requires Python 3.)

jbg
  • 3,578
  • 1
  • 18
  • 28
  • 27
    I'd like to use this answer but I am getting: `AttributeError: 'file' object has no attribute 'readable'` py2.7 – Dan Garthwaite Feb 26 '16 at 14:55
  • 5
    Works with python 3 – matanster Jan 10 '18 at 21:41
  • Clearly this code is not valid for multiple reasons py3/py3 compatibility and real risk of getting ValueError: I/O operation on closed file – sorin Nov 13 '18 at 15:01
  • 8
    @sorin neither of those things make it "not valid". If you're writing a library that still needs to support Python 2, then don't use this code. But many people have the luxury of being able to use software released more recently than a decade ago. If you try to read on a closed file you'll get that exception regardless of whether you use `TextIOWrapper` or not. You can simply handle the exception. – jbg Dec 26 '19 at 17:10
  • 2
    you are maybe late to the party but you answer is up to date with current version of Python, ty – Dusan Gligoric Jan 16 '20 at 12:59
  • This logic works fine but i am getting extra '\n' at every line. Is there a way to suppress that? – Ammad Aug 11 '20 at 23:02
  • 2
    @Ammad `\n` is the newline character. it's conventional in Python for the newline to not be removed when splitting by lines - you'll see the same behaviour if you iterate over a file's lines or use a `readlines()` method. You can get the line without it with just `line[:-1]` (TextIOWrapper operates in "universal newlines" mode by default, so even if you're on Windows and the line ends with `\r\n`, you'll only have `\n` at the end, so `-1` works). You can also use `line.rstrip()` if you don't mind any other whitespace-like characters at the end of the line also being removed. – jbg Aug 13 '20 at 03:43
  • 1
    I got `AttributeError: 'file' object has no attribute 'readable'` on python 3.7, but it was because I was using `subprocess.run` instead of `subprocess.Popen`. – cowlinator Mar 17 '21 at 07:30
21

Indeed, if you sorted out the iterator then buffering could now be your problem. You could tell the python in the sub-process not to buffer its output.

proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)

becomes

proc = subprocess.Popen(['python','-u', 'fake_utility.py'],stdout=subprocess.PIPE)

I have needed this when calling python from within python.

Steve Carter
  • 399
  • 2
  • 6
15

You want to pass these extra parameters to subprocess.Popen:

bufsize=1, universal_newlines=True

Then you can iterate as in your example. (Tested with Python 3.5)

nikoliazekter
  • 758
  • 2
  • 6
  • 23
user1747134
  • 1,982
  • 15
  • 19
9

A function that allows iterating over both stdout and stderr concurrently, in realtime, line by line

In case you need to get the output stream for both stdout and stderr at the same time, you can use the following function.

The function uses Queues to merge both Popen pipes into a single iterator.

Here we create the function read_popen_pipes():

from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor


def enqueue_output(file, queue):
    for line in iter(file.readline, ''):
        queue.put(line)
    file.close()


def read_popen_pipes(p):

    with ThreadPoolExecutor(2) as pool:
        q_stdout, q_stderr = Queue(), Queue()

        pool.submit(enqueue_output, p.stdout, q_stdout)
        pool.submit(enqueue_output, p.stderr, q_stderr)

        while True:

            if p.poll() is not None and q_stdout.empty() and q_stderr.empty():
                break

            out_line = err_line = ''

            try:
                out_line = q_stdout.get_nowait()
            except Empty:
                pass
            try:
                err_line = q_stderr.get_nowait()
            except Empty:
                pass

            yield (out_line, err_line)

read_popen_pipes() in use:

import subprocess as sp


with sp.Popen(my_cmd, stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p:

    for out_line, err_line in read_popen_pipes(p):

        # Do stuff with each line, e.g.:
        print(out_line, end='')
        print(err_line, end='')

    return p.poll() # return status-code
Community
  • 1
  • 1
Rotareti
  • 31,464
  • 14
  • 87
  • 93
3

You can also read lines w/o loop. Works in python3.6.

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
list_of_byte_strings = process.stdout.readlines()
aiven
  • 2,706
  • 1
  • 13
  • 35
  • 1
    Or to convert into strings: `list_of_strings = [x.decode('utf-8').rstrip('\n') for x in iter(process.stdout.readlines())]` – ndtreviv Nov 28 '19 at 11:10
  • 2
    @ndtreviv, you can pass text=True to Popen or use its "encoding" kwarg if you want the output as strings, no need to convert it yourself – Bobby Impollonia Jan 28 '21 at 17:51
1

The following modification of Rômulo's answer works for me on Python 2 and 3 (2.7.12 and 3.6.1):

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
while True:
  line = process.stdout.readline()
  if line != '':
    os.write(1, line)
  else:
    break
binariedMe
  • 4,089
  • 1
  • 16
  • 32
mdh
  • 3,916
  • 2
  • 20
  • 32
1

I tried this with python3 and it worked, source

def output_reader(proc):
    for line in iter(proc.stdout.readline, b''):
        print('got line: {0}'.format(line.decode('utf-8')), end='')


def main():
    proc = subprocess.Popen(['python', 'fake_utility.py'],
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT)

    t = threading.Thread(target=output_reader, args=(proc,))
    t.start()

    try:
        time.sleep(0.2)
        import time
        i = 0

        while True:
        print (hex(i)*512)
        i += 1
        time.sleep(0.5)
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=0.2)
            print('== subprocess exited with rc =', proc.returncode)
        except subprocess.TimeoutExpired:
            print('subprocess did not terminate in time')
    t.join()
shakram02
  • 6,936
  • 3
  • 17
  • 21
-3

Pythont 3.5 added the methods run() and call() to the subprocess module, both returning a CompletedProcess object. With this you are fine using proc.stdout.splitlines():

proc = subprocess.run( comman, shell=True, capture_output=True, text=True, check=True )
for line in proc.stdout.splitlines():
   print "stdout:", line

See also How to Execute Shell Commands in Python Using the Subprocess Run Method

StefanQ
  • 727
  • 9
  • 15