Python Subprocess - filter out logging

Question

Python 3.6

I want to take all input from a subprocess which I run with the subprocess module. I can easily pipe this output to a log file, and it works great.

But, I want to filter out a lot of the lines (lots of noisy output from modules I do not control).

Attempt 1

def run_command(command, log_file):
    process = subprocess.Popen(command, stdout=subprocess.PIPE,
                               stderr=subprocess.STDOUT, bufsize=1,
                               universal_newlines=True)
    while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
            break
        if output and not_noisy_line(output):
            log_file.write(output)
            log_file.flush()
    return process.poll()

But this introduced a race condition between my subprocess and the output.

Attempt 2

I created a new method and a class to wrap the logging.

def run_command(command, log_file):
    process = subprocess.run(command, stdout=QuiteLogger(log_file), stderr=QuiteLogger(log_file), timeout=120)
    return process.returncode

class QuiteLogger(io.TextIOWrapper):
    def write(self, data, encoding=sys.getdefaultencoding()):
        data = filter(data)
        super().write(data)

This does however just completely skip my filter function, my write method is not called at all by the subprocess. (If I call QuietLogger().write('asdasdsa') it goes through the filters)

Any clues?

What do you mean with "race condition between my subprocess and the output"? How does this race condition manifest itself? — Hannu, Mar 09 '18 at 14:07
possible duplicate, see https://stackoverflow.com/questions/2715847/python-read-streaming-input-from-subprocess-communicate — dr_agon, Mar 09 '18 at 17:38

Martin Sand Christensen · Answer 1 · 2018-03-10T13:48:03.663

[Edit: My brain got derailed along the way, and I ended up answering another question than was actually asked. The following solution is useful for concurrently writing to a file, not for using the logging module in any way. However, since at least it's useful for that, I'll leave the answer in place for now.]

If you were just using threads, not separate processes, you'd just have to have a standard lock. So you could try something similar.

There's always the option of locking the output file. I don't know if your operating system supports anything like that, but the usual Unix way of doing it is to create a lock file. Basically, if the file exists, then wait; otherwise create the file before writing to your log file, and after you're done, remove the lock file again. You could use a context manager like this:

import os
import os.path
from time import sleep


class LockedFile():
    def __init__(self, filename, mode):
        self.filename = filename
        self.lockfile = filename + '.lock'
        self.mode = mode

    def __enter__(self):
        while True:
            if os.path.isfile(self.lockfile):
                sleep(0.1)
            else:
                break
        with open(self.lockfile, 'a'):
            os.utime(self.lockfile)
        self.f = open(self.filename, self.mode)
        return self.f

    def __exit__(self, *args):
        self.f.close()
        os.remove(self.lockfile)

# And here's how to use it:
with LockedFile('blorg', 'a') as f:
    f.write('foo\n')

Erm... you know, you're right. I was so focused on 'writing to files' and 'race condition' that I completely ignored the fact that he was using the logging module. — Martin Sand Christensen, Mar 10 '18 at 13:44

JohanL · Answer 2 · 2018-03-10T13:49:45.207

This is an interesting situation in which the file object abstraction partially breaks down. The reason your solution does not work, is because subprocess is not actually using your QuietLogger but is getting the raw file number out of it (then repackaging it as a io.TextIOWrapper object).

I don't know if this is an intrinsic limitation in how the subprocess is handled, relying on OS support, or if this is just a mistake in the Python design, but in order to achieve what you want, you need to use the standard subprocess.PIPE and then roll your own file writer.

If you can wait for the subprocess to finish, then it can be trivially done, using the subprocess.run and then picking the stream out of the CompletedProcess (p) object:

p = subprocess.run(command, stdout=subprocess.PIPE, universal_newlines=True)

data = filter(p.stdout)

with open(logfile, 'w') as f:
    f.write(data)

If you must work with the ouput while it is being generated (thus, you cannot wait for the subprocess to end) the simplest way is to resort to subprocess.Popen and threads:

import subprocess
import threading

logfile ='tmp.txt'
filter_passed = lambda line: line[:3] != 'Bad'
command = ['my_cmd', 'arg']

def writer(p, logfile):
    with open(logfile, 'w') as f:
        for line in p.stdout:
            if filter_passed(line):
                f.write(line)

p = subprocess.Popen(command, stdout=subprocess.PIPE, universal_newlines=True)

t = threading.Thread(target=writer, args=(p,logfile))
t.start()

t.join()

Threading seems very nice. Is there anyway for me to introduce a timeout in your second example? If my process timeout I want to kill it and then allow my process to restart. — davidwessman, Mar 13 '18 at 08:06
There is no need to introduce a timeout to handle the printing. If you kill the subprocess, stdout will be closed and the thread will finish and join with the main thread. if your question rather is how to generate measure and generate a timeout, that is another, separate question, that should be another SO entry. (Though, not too hard to achieve.) — JohanL, Mar 13 '18 at 12:20

Python Subprocess - filter out logging

Attempt 1

Attempt 2

2 Answers2

Linked