5

I'm trying to parse in real time the output of a program block-buffered, which means that output is not available until the process ends. What I need is just to parse line by line, filter and manage data from the output, as it could run for hours.

I've tried to capture the output with subprocess.Popen(), but yes, as you may guess, Popen can't manage this kind of behavior, it keeps buffering until end of process.

from subprocess import Popen, PIPE

p = Popen("my noisy stuff ", shell=True, stdout=PIPE, stderr=PIPE)
for line in p.stdout.readlines():
    #parsing text and getting data

So I found pexpect, which prints the output in real time, as it treats the stdout as a file, or I could even do a dirty trick printing out a file and parsing it outside the function. But ok, it is too dirty, even for me ;)

import pexpect
import sys

pexpect.run("my noisy stuff", logfile=sys.stdout)

But I guess it should a better pythonic way to do this, just manage the stdout like subprocess. Popen does. How can I do this?

EDIT:

Running J.F. proposal:

This is a deliberately wrong audit, it takes about 25 secs. to stop.

from subprocess import Popen, PIPE

command = "bully mon0 -e ESSID -c 8 -b aa:bb:cc:dd:ee:00 -v 2"

p = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)

for line in iter(p.stdout.readline, b''):
    print "inside loop"
    print line

print "outside loop"
p.stdout.close()
p.wait()


#$ sudo python SCRIPT.py
                                ### <= 25 secs later......
# inside loop
#[!] Bully v1.0-21 - WPS vulnerability assessment utility

#inside loop
#[!] Using 'ee:cc:bb:aa:bb:ee' for the source MAC address

#inside loop
#[X] Unable to get a beacon from the AP, possible causes are

#inside loop
#[.]    an invalid --bssid or -essid was provided,

#inside loop
#[.]    the access point isn't on channel '8',

#inside loop
#[.]    you aren't close enough to the access point.

#outside loop

Using this method instead: EDIT: Due to large delays and timeouts in the output, I had to fix the child, and added some hacks, so final code looks like this

import pexpect

child = pexpect.spawn(command)
child.maxsize = 1  #Turns off buffering
child.timeout = 50 # default is 30, insufficient for me. Crashes were due to this param.
for line in child:
    print line,

child.close()

Gives back the same output, but it prints lines in real time. So... SOLVED Thanks @J.F. Sebastian

peluzza
  • 299
  • 2
  • 4
  • 11
  • 1
    related: [Python subprocess readlines() hangs](http://stackoverflow.com/q/12419198/4279) – jfs Nov 25 '13 at 00:39
  • 1
    related: [Python: read streaming input from subprocess.communicate()](http://stackoverflow.com/q/2715847/4279) – jfs Nov 25 '13 at 00:40
  • 1
    Do you need to send replies to the command or you just reading the output? Do you need a line-buffered output or a block-buffered output (e.g., using a 4096-bytes buffer) is sufficient for a program that may run hours? – jfs Nov 25 '13 at 00:53
  • Hi J.F. I've just need to parse the output. The program itself audits data streams, so i want to manage OTHER programs based in output behavior of this program. So my code will continuously reading output. – peluzza Nov 25 '13 at 08:12
  • if *both* `stdout`/`stderr` are `PIPE` then you should read them concurrently otherwise the subprocess might deadlock due to full pipe buffers. – jfs Nov 25 '13 at 12:34
  • 1
    use `print line,` (note: comma at the end -- `sys.stdout.softspace` hack) to avoid doubling newlines. – jfs Nov 25 '13 at 12:36
  • Thanks for the hack! Also I had to adjust maxsize and timeout, there are long sleeps in the output, which made it crash. – peluzza Nov 25 '13 at 14:50

1 Answers1

2

.readlines() reads all lines. No wonder you don't see any output until the subprocess ends. You could use .readline() instead to read line by line as soon as the subprocess flushes its stdout buffer:

from subprocess import Popen, PIPE

p = Popen("my noisy stuff", stdout=PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
    # process line
    ..
p.stdout.close()
p.wait()

If you are already have pexpect then you could use it to workaround the block-buffering issue:

import pexpect

child = pexpect.spawn("my noisy stuff", timeout=None)
for line in child: 
    # process line
    ..
child.close()

See also stdbuf, pty -based solutions from the question I've linked in the comments.

Community
  • 1
  • 1
jfs
  • 346,887
  • 152
  • 868
  • 1,518
  • This code still doesn't works, because some applications has a block buffered output, so the only way to stream out the buffer is running commands from pseudo pty. I've seen other comments by you in S.Overf. and i've tried all them before asking ;) . Pexpect was the one method wich worked... as expeceted. – peluzza Nov 25 '13 at 08:14
  • 1
    @peluzza: I've explicitly added `pexpect` solution. Could you update your question to provide a complete minimal example that demonstrates what "doesn't work" mean in your particular case with the `subprocess` code from my answer? (e.g., `"my noisy stuff"` is `'{ echo a; sleep 2; echo b;}', shell=True` and it is critical to get `'a'` without waiting `2` seconds) – jfs Nov 25 '13 at 08:56
  • Edited, i pasted term output. Anyway your pexpect solution works like charm. Prints the output in real time. Thank you so much.!!! – peluzza Nov 25 '13 at 12:24
  • @Yep, but still same output. I guess it would be a good idea to clean the buffer on every iteration of the loop. – peluzza Nov 25 '13 at 13:01
  • @peluzza: you could set `timeout=None` to disable timeout (30 seconds by default). – jfs Nov 25 '13 at 13:13