catching stdout in realtime from subprocess

Question

I want to subprocess.Popen() rsync.exe in Windows, and print the stdout in Python.

My code works, but it doesn't catch the progress until a file transfer is done! I want to print the progress for each file in real time.

Using Python 3.1 now since I heard it should be better at handling IO.

import subprocess, time, os, sys

cmd = "rsync.exe -vaz -P source/ dest/"
p, line = True, 'start'


p = subprocess.Popen(cmd,
                     shell=True,
                     bufsize=64,
                     stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)

for line in p.stdout:
    print(">>> " + str(line.rstrip()))
    p.stdout.flush()

Duplicate: http://stackoverflow.com/questions/1085071/real-time-intercepting-of-stdout-from-another-process-in-python, http://stackoverflow.com/questions/874815/how-do-i-get-real-time-information-back-from-a-subprocess-popen-in-python-2-5, http://stackoverflow.com/questions/527197/intercepting-stdout-of-a-subprocess-while-it-is-running — S.Lott, Oct 22 '09 at 13:02
(Coming from google?) all PIPEs will deadlock when one of the PIPEs' buffer gets filled up and not read. e.g. stdout deadlock when stderr is filled. Never pass a PIPE you don't intend read. — Nasser Al-Wohaibi, May 07 '14 at 11:08
Could someone explain why you couldn't just set stdout to sys.stdout instead of subprocess.PIPE? — Mike, Jun 09 '15 at 10:14

score 107 · Answer 1 · edited Mar 25 '17 at 15:29

107

Some rules of thumb for subprocess.

Never use shell=True. It needlessly invokes an extra shell process to call your program.
When calling processes, arguments are passed around as lists. sys.argv in python is a list, and so is argv in C. So you pass a list to Popen to call subprocesses, not a string.
Don't redirect stderr to a PIPE when you're not reading it.
Don't redirect stdin when you're not writing to it.

Example:

import subprocess, time, os, sys
cmd = ["rsync.exe", "-vaz", "-P", "source/" ,"dest/"]

p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT)

for line in iter(p.stdout.readline, b''):
    print(">>> " + line.rstrip())

That said, it is probable that rsync buffers its output when it detects that it is connected to a pipe instead of a terminal. This is the default behavior - when connected to a pipe, programs must explicitly flush stdout for realtime results, otherwise standard C library will buffer.

To test for that, try running this instead:

cmd = [sys.executable, 'test_out.py']

and create a test_out.py file with the contents:

import sys
import time
print ("Hello")
sys.stdout.flush()
time.sleep(10)
print ("World")

Executing that subprocess should give you "Hello" and wait 10 seconds before giving "World". If that happens with the python code above and not with rsync, that means rsync itself is buffering output, so you are out of luck.

A solution would be to connect direct to a pty, using something like pexpect.

edited Mar 25 '17 at 15:29

brennonbrimhall

54
1
6

answered Oct 22 '09 at 12:29

nosklo

193,422
54
273
281

12

`shell=False` is right thing when you construct command line especially from user entered data. But nevertheless `shell=True` is useful too when you get the whole command line from trusted source (e.g. hardcoded in the script). – Denis Otkidach Oct 22 '09 at 16:52
11

@Denis Otkidach: I don't think that warrants usage of `shell=True`. Think about it - you're invoking another process on your OS, involving memory allocation, disk usage, processor scheduling, just to **split a string**! And one you joined yourself!! You could split in python, but it is easier writing each parameter separately anyway. Also, using a list means you don't have to escape special shell chars: spaces, `;`, `>`, ` – nosklo Oct 22 '09 at 20:02
nosklo,that should be: p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) – Senthil Kumaran Oct 23 '09 at 07:28
shell=False is bad but it allows convenient ways to pipe. Is there a nice way to run a string of pipes? i.e. run the command 'cat longfile.tab | cut -f1 | head -100'.split() – mathtick Nov 04 '10 at 20:30
1

@mathtick: I'm not sure why you would do those operations as separate processes... you can cut file contents and extract first field easily in python by using the `csv` module. But as an example, your pipeline in python would be: `p = Popen(['cut', '-f1'], stdin=open('longfile.tab'), stdout=PIPE) ; p2 = Popen(['head', '-100'], stdin=p.stdout, stdout=PIPE) ; result, stderr = p2.communicate() ; print result` Note that you can work with long filenames and shell special characters without having to escape, now that the shell is not involved. Also it's a lot faster since there's one less process. – nosklo Nov 04 '10 at 22:11
@nosklo I'm thinking of situation where you want to grep a regex in a file 13+ million lines long. I tried pythoning the grep part but it was painfully slow compared to grep. Maybe something else was causing the slowness? I gave up playing the python search code pretty quickly. – mathtick Nov 05 '10 at 14:08
@mathtick: yeah, grep is a wonderful piece of software that really goes out of its way to do what it does more efficiently. You can't beat it with python. However it shouldn't be "painfully slow", just slower. Do you have the code you're using posted somewhere? Maybe you want to ask another question here so we can debate if there's something wrong and what the best solution is... – nosklo Nov 05 '10 at 14:54
@nosklo: I'll try to post a clean test of python grep ... it would be nice to understand the speed differences a little better. – mathtick Nov 05 '10 at 15:40
11

use `for line in iter(p.stdout.readline, b'')` instead of `for line in p.stdout` in Python 2 otherwise lines are not read in real time even if the source process doesn't buffer its output. – jfs Mar 10 '13 at 22:03
The case where I routinely use `shell=True` is big pipelines. Yeah, just piping through grep is quite arguably not worth it, but chains of 4 or 5 commands aren't rare when doing stuff like bioinfomatics. It is sometimes tempting to use python replace a bash script... but that is a different thread. – travc Feb 10 '14 at 04:58
this bombds with an error for me `TypeError: Can't convert 'bytes' object to str implicitly` in python3 – Tommy Aug 30 '16 at 15:51
I do have a use case for `shell=True`: The ability to copy&paste the command to a shell later. So it should be "almost never", not "never". – toolforger Jul 30 '19 at 05:02
@toolforger Sure, you can have infinite use cases if you consider the invalid ones :D I also have a use case: "Being able to say I have a use case" XD XD More seriously though, if you think it is worth invoking an extra useless process and dealing with quoting hell yourself just to be able to copy and paste the command later, go for it, but that doesn't make it a good use case. – nosklo Jul 31 '19 at 19:25
this solution does not work – Vaidøtas I. Feb 27 '21 at 22:20

score 45 · Answer 2 · answered Mar 03 '16 at 14:51

45

I know this is an old topic, but there is a solution now. Call the rsync with option --outbuf=L. Example:

cmd=['rsync', '-arzv','--backup','--outbuf=L','source/','dest']
p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE)
for line in iter(p.stdout.readline, b''):
    print '>>> {}'.format(line.rstrip())

answered Mar 03 '16 at 14:51

Elvin

683
7
13

3

This works and should be upvoted to save future readers from scrolling through all of the dialog above. – VectorVictor Nov 24 '16 at 19:05
1

@VectorVictor It doesn't explain what is going on, and why it's going on. It might be that your program works, until: 1. you add `preexec_fn=os.setpgrp` to make the program survive its parent script 2. you skip reading from the process's pipe 3. the process outputs lots of data, filling the pipe 4. you are stuck for hours, trying to figure out why the program you're running quits *after some random amount of time*. The answer from @nosklo helped me a lot. – danuker Dec 09 '17 at 20:01
didn't work for me with the option :/ – Vaidøtas I. Feb 27 '21 at 21:49

score 19 · Answer 3 · edited Apr 13 '17 at 12:36

19

On Linux, I had the same problem of getting rid of the buffering. I finally used "stdbuf -o0" (or, unbuffer from expect) to get rid of the PIPE buffering.

proc = Popen(['stdbuf', '-o0'] + cmd, stdout=PIPE, stderr=PIPE)
stdout = proc.stdout

I could then use select.select on stdout.

See also https://unix.stackexchange.com/questions/25372/

edited Apr 13 '17 at 12:36

Community

1
1

answered Nov 06 '16 at 19:31

Ling

405
4
8

2

For anyone trying to grab the C code stdout from Python, I can confirm that this solution was the only one that worked for me. To be clear, I'm talking about adding 'stdbuf', '-o0' to my existing command list in Popen. – Reckless Aug 31 '17 at 03:42
Thank you! `stdbuf -o0` proved to be *really* useful with a bunch of pytest/pytest-bdd tests I wrote that spawn a C++ app and verify that it emits certain log statements. Without `stdbuf -o0`, these tests needed 7 seconds to get the (buffered) output from the C++ program. Now they run almost instantaneously! – evadeflow Sep 08 '19 at 16:18
This answer saved me today! Running an application as subprocesses as part of `pytest`, it was impossible for me to get it's output. `stdbuf` does it. – Janos Nov 16 '20 at 16:38

score 19 · Answer 4 · answered Oct 17 '18 at 09:12

19

Depending on the use case, you might also want to disable the buffering in the subprocess itself.

If the subprocess will be a Python process, you could do this before the call:

os.environ["PYTHONUNBUFFERED"] = "1"

Or alternatively pass this in the env argument to Popen.

Otherwise, if you are on Linux/Unix, you can use the stdbuf tool. E.g. like:

cmd = ["stdbuf", "-oL"] + cmd

See also here about stdbuf or other options.

answered Oct 17 '18 at 09:12

Albert

57,395
54
209
347

2

You save my day, Thanks for PYTHONUNBUFFERED=1 – diewland Jul 03 '19 at 07:27
Had a problem when running python code /w Popen inside a thread and stdout will only be printed after the thread terminates. This fixed it. – Peppershaker May 10 '21 at 19:43

IBue · Answer 5 · 2013-06-21T21:33:23.123

9

for line in p.stdout:
  ...

always blocks until the next line-feed.

For "real-time" behaviour you have to do something like this:

while True:
  inchar = p.stdout.read(1)
  if inchar: #neither empty string nor None
    print(str(inchar), end='') #or end=None to flush immediately
  else:
    print('') #flush for implicit line-buffering
    break

The while-loop is left when the child process closes its stdout or exits. read()/read(-1) would block until the child process closed its stdout or exited.

edited Jun 21 '13 at 21:33

answered Feb 12 '13 at 16:55

IBue

191
2
5

1

`inchar` is never `None` use `if not inchar:` instead (`read()` returns empty string on EOF). btw, It is worse `for line in p.stdout` doesn't print even full lines in realtime in Python 2 (`for line in `iter(p.stdout.readline, '')` could be used instead). – jfs Mar 10 '13 at 22:08
1

I have tested this with python 3.4 on osx, and it does not work. – qed Nov 16 '14 at 21:09
1

@qed: `for line in p.stdout:` works on Python 3. Be sure to understand the difference between `''` (Unicode string) and `b''` (bytes). See [Python: read streaming input from subprocess.communicate()](http://stackoverflow.com/a/17698359/4279) – jfs Mar 04 '16 at 13:13

score 7 · Answer 6 · edited May 02 '13 at 20:29

7

Your problem is:

for line in p.stdout:
    print(">>> " + str(line.rstrip()))
    p.stdout.flush()

the iterator itself has extra buffering.

Try doing like this:

while True:
  line = p.stdout.readline()
  if not line:
     break
  print line

edited May 02 '13 at 20:29

nhahtdh

52,949
15
113
149

answered May 02 '13 at 19:23

zviadm

1,025
9
11

score 5 · Answer 7 · edited Oct 01 '12 at 14:34

5

You cannot get stdout to print unbuffered to a pipe (unless you can rewrite the program that prints to stdout), so here is my solution:

Redirect stdout to sterr, which is not buffered. '<cmd> 1>&2' should do it. Open the process as follows: myproc = subprocess.Popen('<cmd> 1>&2', stderr=subprocess.PIPE)
You cannot distinguish from stdout or stderr, but you get all output immediately.

Hope this helps anyone tackling this problem.

edited Oct 01 '12 at 14:34

oers

17,419
11
64
73

answered Aug 10 '12 at 13:34

Erik

99
1
1

4

Have you tried it? Because it doesn't work.. If stdout is buffered in that process, it won't be redirected to stderr in the same way it isn't redirected to a PIPE or file.. – Filipe Pina Sep 04 '15 at 18:38
5

This is plain wrong. stdout buffering occurs within the program itself. The shell syntax `1>&2` just changes which files the file-descriptors point to before launching the program. The program itself can't distinguish between redirecting stdout to stderr (`1>&2`) or vice-versa (`2>&1`) so this will have no effect on the buffering behaviour of the program.And either way the `1>&2` syntax is interpreted by the shell. `subprocess.Popen(' 1>&2', stderr=subprocess.PIPE)` would fail because you haven't specified `shell=True`. – Will Manley Jul 11 '16 at 13:25
In case people would be reading this: I tried using stderr instead of stdout, it shows the exact same behavior. – martinthenext Nov 04 '16 at 16:31

score 3 · Answer 8 · answered Oct 22 '09 at 12:26

3

Change the stdout from the rsync process to be unbuffered.

p = subprocess.Popen(cmd,
                     shell=True,
                     bufsize=0,  # 0=unbuffered, 1=line-buffered, else buffer-size
                     stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)

answered Oct 22 '09 at 12:26

Will

68,898
35
156
231

3

Buffering happens on the rsync side, changing bufsize attribute on python side won't help. – nosklo Oct 22 '09 at 12:37
15

For anyone else searching, nosklo's answer is completely wrong: rsync's progress display is not buffered; the real problem is that subprocess returns a file object and the file iterator interface has a poorly documented internal buffer even with bufsize=0, requiring you to call readline() repeatedly if you need results before the buffer fills. – Chris Adams Nov 03 '12 at 02:21

score 3 · Answer 9 · answered Aug 14 '16 at 14:08

To avoid caching of output you might wanna try pexpect,

child = pexpect.spawn(launchcmd,args,timeout=None)
while True:
    try:
        child.expect('\n')
        print(child.before)
    except pexpect.EOF:
        break

PS : I know this question is pretty old, still providing the solution which worked for me.

PPS: got this answer from another question

score 3 · Answer 10 · answered Sep 11 '17 at 13:16

    p = subprocess.Popen(command,
                                bufsize=0,
                                universal_newlines=True)

I am writing a GUI for rsync in python, and have the same probelms. This problem has troubled me for several days until i find this in pyDoc.

If universal_newlines is True, the file objects stdout and stderr are opened as text files in universal newlines mode. Lines may be terminated by any of '\n', the Unix end-of-line convention, '\r', the old Macintosh convention or '\r\n', the Windows convention. All of these external representations are seen as '\n' by the Python program.

It seems that rsync will output '\r' when translate is going on.

MikeGM · Answer 11 · 2013-03-07T23:05:13.570

1

I've noticed that there is no mention of using a temporary file as intermediate. The following gets around the buffering issues by outputting to a temporary file and allows you to parse the data coming from rsync without connecting to a pty. I tested the following on a linux box, and the output of rsync tends to differ across platforms, so the regular expressions to parse the output may vary:

import subprocess, time, tempfile, re

pipe_output, file_name = tempfile.TemporaryFile()
cmd = ["rsync", "-vaz", "-P", "/src/" ,"/dest"]

p = subprocess.Popen(cmd, stdout=pipe_output, 
                     stderr=subprocess.STDOUT)
while p.poll() is None:
    # p.poll() returns None while the program is still running
    # sleep for 1 second
    time.sleep(1)
    last_line =  open(file_name).readlines()
    # it's possible that it hasn't output yet, so continue
    if len(last_line) == 0: continue
    last_line = last_line[-1]
    # Matching to "[bytes downloaded]  number%  [speed] number:number:number"
    match_it = re.match(".* ([0-9]*)%.* ([0-9]*:[0-9]*:[0-9]*).*", last_line)
    if not match_it: continue
    # in this case, the percentage is stored in match_it.group(1), 
    # time in match_it.group(2).  We could do something with it here...

edited Mar 07 '13 at 23:05

answered Nov 09 '11 at 00:31

MikeGM

853
6
12

it is not in real time. A file doesn't solve buffering issue on rsync's side. – jfs Jul 29 '12 at 20:26
tempfile.TemporaryFile can delete itself for easier clean up in case of exceptions – jfs Jul 29 '12 at 20:27
3

`while not p.poll()` leads to infinite loop if subprocess exits successfully with 0, use `p.poll() is None` instead – jfs Jul 29 '12 at 20:30
Windows might forbid to open already opened file, so `open(file_name)` might fail – jfs Jul 29 '12 at 20:32
1

I just found this answer, unfortunately only for linux, but works like a charm [link](http://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe/25378#25378) So i just extend my command as follows: `command_argv = ["stdbuf","-i0","-o0","-e0"] + command_argv` and call: `popen = subprocess.Popen(cmd, stdout=subprocess.PIPE)` and now I can read from without any buffering – Arvid Terzibaschian Jul 14 '16 at 09:53

score 0 · Answer 12 · answered Aug 05 '20 at 12:26

if you run something like this in a thread and save the ffmpeg_time property in a property of a method so you can access it, it would work very nice I get outputs like this: output be like if you use threading in tkinter

input = 'path/input_file.mp4'
output = 'path/input_file.mp4'
command = "ffmpeg -y -v quiet -stats -i \"" + str(input) + "\" -metadata title=\"@alaa_sanatisharif\" -preset ultrafast -vcodec copy -r 50 -vsync 1 -async 1 \"" + output + "\""
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True, shell=True)
for line in self.process.stdout:
    reg = re.search('\d\d:\d\d:\d\d', line)
    ffmpeg_time = reg.group(0) if reg else ''
    print(ffmpeg_time)

watsonic · Answer 13 · 2018-10-23T03:50:06.247

In Python 3, here's a solution, which takes a command off the command line and delivers real-time nicely decoded strings as they are received.

Receiver (receiver.py):

import subprocess
import sys

cmd = sys.argv[1:]
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in p.stdout:
    print("received: {}".format(line.rstrip().decode("utf-8")))

Example simple program that could generate real-time output (dummy_out.py):

import time
import sys

for i in range(5):
    print("hello {}".format(i))
    sys.stdout.flush()  
    time.sleep(1)

Output:

$python receiver.py python dummy_out.py
received: hello 0
received: hello 1
received: hello 2
received: hello 3
received: hello 4

catching stdout in realtime from subprocess

13 Answers13

Linked

Related