0

I like to have the Python (2.6, sorry!) equivalent of this shell pipe:

$ longrunningprocess | sometextfilter | gzip -c

That is, I have to call a binary longrunningprocess, filter its output through sometextfilter and need to get gzip output.

I know how to use subprocess pipes, but I need the output of the pipe chunkwise (probably using yield) and not all at once. E.g. this https://security.openstack.org/guidelines/dg_avoid-shell-true.html works only for getting all output at once.

Note, that both longrunningprocess and sometextfilter are external programs, that cannot be replaced with Python functions.

Thanks in advance for any hint or example!

  • What is Python doing in this situation? Is it the text filter? Is it gzipping? Is it just collecting the final output at the end? – jwodder Jan 19 '16 at 16:41
  • piping to system gzip is not necessary... python has a built-in interface to gzip (via the `gzip` module). – Corey Goldberg Jan 19 '16 at 17:00
  • rather than a complex pipeline, it sounds like you just need one python program that reads from stdin and writes the compressed data to stdout (or file). – Corey Goldberg Jan 19 '16 at 17:00
  • Of course, the gzipping can be done by Python, but both `longrunningprocess` and `sometextfilter` are C programs, that cannot be reimplemented in Python. That is, Python needs to read stdout of the first process, pipe it into stdin of the second one, and finally gzip stdout of that process. I'm a little bit lost how to do this in Python, without waiting for process termination, which cannot be done. –  Jan 21 '16 at 11:34
  • 1
    I see two independent questions: [How do I use subprocess.Popen to connect multiple processes by pipes?](http://stackoverflow.com/q/295459/4279) and how to read subprocess' output in chunks (if you set `stdout=PIPE` then `process.stdout` is an ordinary (non-seekable) file object that has the corresponding method e.g., `chunk = process.stdout.read(chunk_size)`). What have you tried? What specific issues do you have with your code? – jfs Jan 21 '16 at 16:23

2 Answers2

0

Again, I thought it were difficult, while Python is (supposed to be) easy. Just concatenating the subprocesses just works, it seems:

def get_lines():
    lrp = subprocess.Popen(["longrunningprocess"],
                           stdout=subprocess.PIPE,
                           close_fds=True)
    stf = subprocess.Popen(["sometextfilter"],
                           stdin=lrp.stdout,
                           stdout=subprocess.PIPE,
                           bufsize=1,
                           close_fds=True)

    for l in iter(stf.stdout.readline, ''):
        yield l

    lrp.stdout.close()
    stf.stdout.close()
    stf.stdin.close()

    stf.wait()
    lrp.wait()

[Changes by J.F. Sebastian applied. Thanks!]

Then I can use Pythons gzip for compression.

  • (1) you need `lrp.stdout.close()` otherwise `longrunningprocess` may hang if `sometextfilter` dies prematurely (2) `close_fds=True` may be used only on POSIX systems here (3) `for l in stf.stdout` may delay the output significantly due to the read-ahead bug (use `iter(p.stdout.readline, '')` instead (4) `bufsize=1` is useless in the first call (5) close the pipes to avoid leaking file descriptors (6) call `.wait()` to avoid zombies – jfs Jan 22 '16 at 08:15
  • 1- `lrp.stdout.close()` can be called immediately after `Popen(["sometextfilter"], stdin=lrp.stdout,..)`. 2- I don't see why won't you use `shell=True` here (`longrunningprocess` name suggests that the overhead of starting the shell is not the issue here). It is safe to use the shell if you don't pass it a non-trusted input i.e., it is totally acceptable to use `shell=True` with a string literal (a constant) in your Python source code. 3- If you insist on using two `Popen()` calls then switch the order of processes are called like in [this answer that I've linked earlier](http://goo.gl/1uJhhB) – jfs Feb 02 '16 at 17:36
0

The shell syntax is optimized for one-liners, use it:

#!/usr/bin/env python2
import sys
from subprocess import Popen, PIPE

LINE_BUFFERED = 1
ON_POSIX = 'posix' in sys.builtin_module_names

p = Popen('longrunningprocess | sometextfilter', shell=True,
          stdout=PIPE, bufsize=LINE_BUFFERED, close_fds=ON_POSIX)
with p.stdout:
    for line in iter(p.stdout.readline, ''):
        print line,  # do something with the line
p.wait()

If you want to emulate the pipeline manually:

#!/usr/bin/env python2
import sys
from subprocess import Popen, PIPE

LINE_BUFFERED = 1
ON_POSIX = 'posix' in sys.builtin_module_names

sometextfilter = Popen('sometextfilter', stdin=PIPE, stdout=PIPE,
                       bufsize=LINE_BUFFERED, close_fds=ON_POSIX)
longrunningprocess = Popen('longrunningprocess', stdout=sometextfilter.stdin, 
                           close_fds=ON_POSIX)
with sometextfilter.stdin, sometextfilter.stdout as pipe:
    for line in iter(pipe.readline, ''):
        print line, # do something with the line
sometextfilter.wait()
longrunningprocess.wait()
Community
  • 1
  • 1
jfs
  • 346,887
  • 152
  • 868
  • 1,518