Python logging and subprocess ouput and error stream

Question

I would like to start off a python process and log subprocess error messages to the logging object of the parent script. I would ideally like to unify the log streams into one file. Can I somehow access the output stream of the logging class? One solution I know of is to use proc log for logging. As described in the answer below, I could read from the proc.stdin and stderr, but I'd have duplicate logging headers. I wonder if there is a way to pass the file descriptor underlying the logging class directly to the subprocess?

logging.basicConfig(filename="test.log",level=logging.DEBUG)
logging.info("Started")
procLog = open(os.path.expanduser("subproc.log"), 'w')
proc = subprocess.Popen(cmdStr, shell=True, stderr=procLog, stdout=procLog)
proc.wait()
procLog.flush()

score 5 · Accepted Answer · edited May 23 '17 at 12:03

Based on Adam Rosenfield's code, you could

use select.select to block until there is output to be read from proc.stdout or proc.stderr,
read and log that output, then
repeat until the process is done.

Note that the following writes to /tmp/test.log and runs the command ls -laR /tmp. Change to suit your needs.

(PS: Typically /tmp contains directories which can not be read by normal users, so running ls -laR /tmp produces output to both stdout and stderr. The code below correctly interleaves those two streams as they are produced.)

import logging
import subprocess
import shlex
import select
import fcntl
import os
import errno
import contextlib

logger = logging.getLogger(__name__)

def make_async(fd):
    '''add the O_NONBLOCK flag to a file descriptor'''
    fcntl.fcntl(fd, fcntl.F_SETFL, fcntl.fcntl(fd, fcntl.F_GETFL) | os.O_NONBLOCK)

def read_async(fd):
    '''read some data from a file descriptor, ignoring EAGAIN errors'''
    try:
        return fd.read()
    except IOError, e:
        if e.errno != errno.EAGAIN:
            raise e
        else:
            return ''

def log_fds(fds):
    for fd in fds:
        out = read_async(fd)
        if out:
            logger.info(out)

@contextlib.contextmanager
def plain_logger():
    root = logging.getLogger()    
    hdlr = root.handlers[0]
    formatter_orig = hdlr.formatter
    hdlr.setFormatter(logging.Formatter('%(message)s'))
    yield 
    hdlr.setFormatter(formatter_orig)

def main():
    # fmt = '%(name)-12s: %(levelname)-8s %(message)s'
    logging.basicConfig(filename = '/tmp/test.log', mode = 'w',
                        level = logging.DEBUG)

    logger.info("Started")
    cmdStr = 'ls -laR /tmp'

    with plain_logger():
        proc = subprocess.Popen(shlex.split(cmdStr),
                                stdout = subprocess.PIPE, stderr = subprocess.PIPE)
        # without `make_async`, `fd.read` in `read_async` blocks.
        make_async(proc.stdout)
        make_async(proc.stderr)
        while True:
            # Wait for data to become available 
            rlist, wlist, xlist = select.select([proc.stdout, proc.stderr], [], [])
            log_fds(rlist)
            if proc.poll() is not None:
                # Corner case: check if more output was created
                # between the last call to read_async and now                
                log_fds([proc.stdout, proc.stderr])                
                break

    logger.info("Done")

if __name__ == '__main__':
    main()

Edit:

You can redirect stdout and stderr to logfile = open('/tmp/test.log', 'a'). A small difficulty with doing so, however, is that any logger handler that is also writing to /tmp/test.log will not be aware of what the subprocess is writing, and so the log file may get garbled.

If you do not make logging calls while the subprocess is doing its business, then the only problem is that the logger handler has the wrong position in the file after the subprocess has finished. That can be fixed by calling

handler.stream.seek(0, 2)

so the handler will resume writing at the end of the file.

import logging
import subprocess
import contextlib
import shlex

logger = logging.getLogger(__name__)

@contextlib.contextmanager
def suspended_logger():
    root = logging.getLogger()    
    handler = root.handlers[0]
    yield 
    handler.stream.seek(0, 2)

def main():
    logging.basicConfig(filename = '/tmp/test.log', filemode = 'w',
                        level = logging.DEBUG)

    logger.info("Started")
    with suspended_logger():
        cmdStr = 'test2.py 1>>/tmp/test.log 2>&1'
        logfile = open('/tmp/test.log', 'a')
        proc = subprocess.Popen(shlex.split(cmdStr),
                                stdout = logfile,
                                stderr = logfile)
        proc.communicate()
    logger.info("Done")

if __name__ == '__main__':
    main()

This would work, but the output that I'm trying to append to the logging file is a logger stream from another program. So, if I did this I'd get two time stamps per line (let me know if this is not clear). So, is there a way to write directly to the unerlying logging file descriptor ... bypassing the logging prefixes? — fodon, Feb 02 '12 at 16:13
You could temporarily change the logging format to `'%(message)s'`. I've add some code to do this, above. — unutbu, Feb 02 '12 at 16:36
well, yes, that works, but wouldn't it be so much more elegant if I could just pass and file descriptor to the process? Can that really not be done. — fodon, Feb 03 '12 at 14:47
How about reopening the logging file with a different ID in append mode? What could go wrong? — fodon, Feb 03 '12 at 15:34
I think your original solution was the easiest. I now see the advantage of using the logger primarily ... better formatting and control over the stream ... could even "tee" the stream to console. Thanks. — fodon, Feb 03 '12 at 22:32
Is this solution susceptible to deadlock? The [subprocessing documentation](https://docs.python.org/2/library/subprocess.html#popen-objects) suggests it is unsafe to directly use `proc.stdout` and `proc.stdin` and suggests using communicate() instead — Bryan P, Jan 31 '15 at 04:15
@BryanP: The warning the you linked to apply specifically to those `subprocess` functions (such as `call`) that under the hood use `wait()` instead of `communicate()`. The problem there is that wait() blocks until the process is complete and it does not read from the stdout or stderr pipes. The pipe has a fixed size, and if it gets filled there is a deadlock -- the child process waits for the pipe to be read before writing some more, and the parent process waits for the child to end. — unutbu, Jan 31 '15 at 11:51
@BryanP: In the solution above, select.select is used to continually monitor `stdout` and `stderr`. When the child process writes to these pipes, `select.select` returns the pipes that are ready to be read from, and `log_fds` performs the read. Since the pipes are continually being read from, there will be no deadlock. — unutbu, Jan 31 '15 at 12:49

Python logging and subprocess ouput and error stream

1 Answers1

Linked