27

For some other reasons, the c++ shared library I used outputs some texts to standard output. In python, I want to capture the output and save to a variable. There are many similar questions about redirect the stdout, but not work in my code.

Example: Suppressing output of module calling outside library

1 import sys
2 import cStringIO
3 save_stdout = sys.stdout
4 sys.stdout = cStringIO.StringIO()
5 func()
6 sys.stdout = save_stdout

In line 5, func() will call the shared library, the texts generated by shared library still output to console! If change func() to print "hello", it works!

My problem is:

  1. how to capture stdout of the c++ shared library to a variable?
  2. Why using StringIO, can't capture the outputs from shared library?
Community
  • 1
  • 1
UserCool
  • 273
  • 1
  • 3
  • 8
  • The C++ code must be used to call into Python, so that it also uses the `sys.stdout` or `print()`. As it is, it probably uses `std::cout` or `printf()`, both of which write to the process' STDOUT filedescriptor. If you want to capture that output, you will have to replace it with a pipe. The easiest way is probably to change the C++ code to return an according string, which is also the most straightforward way to pass data around. – Ulrich Eckhardt Jun 18 '14 at 05:18
  • To be clear: Changing `sys.stdout` only affects *the Python interpreter*'s standard output; it does not change the *actual* standard output file descriptor (`1`). The question you've lined to *is* relevant and would work for your problem. – Jonathon Reinhart Jun 18 '14 at 05:20

7 Answers7

20

Python's sys.stdout object is simply a Python wrapper on top of the usual stdout file descriptor—changing it only affects the Python process, not the underlying file descriptor. Any non-Python code, whether it be another executable which was exec'ed or a C shared library which was loaded, won't understand that and will continue using the ordinary file descriptors for I/O.

So, in order for the shared library to output to a different location, you need to change the underlying file descriptor by opening a new file descriptor and then replacing stdout using os.dup2(). You could use a temporary file for the output, but it's a better idea to use a pipe created with os.pipe(). However, this has the danger for deadlock, if nothing is reading the pipe, so in order to prevent that we can use another thread to drain the pipe.

Below is a full working example which does not use temporary files and which is not susceptible to deadlock (tested on Mac OS X).

C shared library code:

// test.c
#include <stdio.h>

void hello(void)
{
  printf("Hello, world!\n");
}

Compiled as:

$ clang test.c -shared -fPIC -o libtest.dylib

Python driver:

import ctypes
import os
import sys
import threading

print 'Start'

liba = ctypes.cdll.LoadLibrary('libtest.dylib')

# Create pipe and dup2() the write end of it on top of stdout, saving a copy
# of the old stdout
stdout_fileno = sys.stdout.fileno()
stdout_save = os.dup(stdout_fileno)
stdout_pipe = os.pipe()
os.dup2(stdout_pipe[1], stdout_fileno)
os.close(stdout_pipe[1])

captured_stdout = ''
def drain_pipe():
    global captured_stdout
    while True:
        data = os.read(stdout_pipe[0], 1024)
        if not data:
            break
        captured_stdout += data

t = threading.Thread(target=drain_pipe)
t.start()

liba.hello()  # Call into the shared library

# Close the write end of the pipe to unblock the reader thread and trigger it
# to exit
os.close(stdout_fileno)
t.join()

# Clean up the pipe and restore the original stdout
os.close(stdout_pipe[0])
os.dup2(stdout_save, stdout_fileno)
os.close(stdout_save)

print 'Captured stdout:\n%s' % captured_stdout
Adam Rosenfield
  • 360,316
  • 93
  • 484
  • 571
  • Does this actually work if the c code writes more than 1024 bytes? I'd expect that the GIL would be acquired when you call into the C code and it won't be released until you call back into the python -- so I'd think that you could still fill your pipe even with the pipe-cleaner as it won't get a chance to execute while something else holds the GIL. – mgilson Sep 24 '19 at 13:55
  • @mgilson: That's a valid concern, but in a test I did, this scales up to MBs+ of data without any deadlock issues on both CPython 2.7.16 and 3.7.3 (with a few slight modifications to make it Py3k-compatible), so it appears that at least those versions of the runtime do *not* hold the GIL while calling into the C code. – Adam Rosenfield Oct 17 '19 at 15:15
  • 1
    Yeah, so apparently this is up-to-spec with the `ctypes` docs (e.g. https://docs.python.org/3.3/library/ctypes.html#ctypes.PyDLL _does_ grab the GIL). I guess the devil is in the details here and it really depends on _how_ that C/C++ code was exposed to the python. e.g. it probably _doesn't_ work with something like `boost::python` or cython wrapped stuff unless you do additional work to release the GIL in those cases. – mgilson Oct 26 '19 at 06:46
19

Thanks to the nice answer by Adam, I was able to get this working. His solution didn't quite work for my case, since I needed to capture text, restore, and capture text again many times, so I had to make some pretty big changes. Also, I wanted to get this to work for sys.stderr as well (with the potential for other streams).

So, here is the solution I ended up using (with or without threading):

Code

import os
import sys
import threading
import time


class OutputGrabber(object):
    """
    Class used to grab standard output or another stream.
    """
    escape_char = "\b"

    def __init__(self, stream=None, threaded=False):
        self.origstream = stream
        self.threaded = threaded
        if self.origstream is None:
            self.origstream = sys.stdout
        self.origstreamfd = self.origstream.fileno()
        self.capturedtext = ""
        # Create a pipe so the stream can be captured:
        self.pipe_out, self.pipe_in = os.pipe()

    def __enter__(self):
        self.start()
        return self

    def __exit__(self, type, value, traceback):
        self.stop()

    def start(self):
        """
        Start capturing the stream data.
        """
        self.capturedtext = ""
        # Save a copy of the stream:
        self.streamfd = os.dup(self.origstreamfd)
        # Replace the original stream with our write pipe:
        os.dup2(self.pipe_in, self.origstreamfd)
        if self.threaded:
            # Start thread that will read the stream:
            self.workerThread = threading.Thread(target=self.readOutput)
            self.workerThread.start()
            # Make sure that the thread is running and os.read() has executed:
            time.sleep(0.01)

    def stop(self):
        """
        Stop capturing the stream data and save the text in `capturedtext`.
        """
        # Print the escape character to make the readOutput method stop:
        self.origstream.write(self.escape_char)
        # Flush the stream to make sure all our data goes in before
        # the escape character:
        self.origstream.flush()
        if self.threaded:
            # wait until the thread finishes so we are sure that
            # we have until the last character:
            self.workerThread.join()
        else:
            self.readOutput()
        # Close the pipe:
        os.close(self.pipe_in)
        os.close(self.pipe_out)
        # Restore the original stream:
        os.dup2(self.streamfd, self.origstreamfd)
        # Close the duplicate stream:
        os.close(self.streamfd)

    def readOutput(self):
        """
        Read the stream data (one byte at a time)
        and save the text in `capturedtext`.
        """
        while True:
            char = os.read(self.pipe_out, 1)
            if not char or self.escape_char in char:
                break
            self.capturedtext += char

Usage

with sys.stdout, the default:

out = OutputGrabber()
out.start()
library.method(*args) # Call your code here
out.stop()
# Compare the output to the expected value:
# comparisonMethod(out.capturedtext, expectedtext)

with sys.stderr:

out = OutputGrabber(sys.stderr)
out.start()
library.method(*args) # Call your code here
out.stop()
# Compare the output to the expected value:
# comparisonMethod(out.capturedtext, expectedtext)

in a with block:

out = OutputGrabber()
with out:
    library.method(*args) # Call your code here
# Compare the output to the expected value:
# comparisonMethod(out.capturedtext, expectedtext)

Tested on Windows 7 with Python 2.7.6 and Ubuntu 12.04 with Python 2.7.6.

To work in Python 3, change char = os.read(self.pipe_out,1)
to char = os.read(self.pipe_out,1).decode(self.origstream.encoding).

craymichael
  • 3,468
  • 1
  • 11
  • 22
Devan Williams
  • 1,139
  • 13
  • 17
2

Thank you Devan!

Your code helped me a lot, but I had some problems using it I want to share here:

For any reason the line where you want to force the capture to stop

self.origstream.write(self.escape_char)

does not work. I commented it out and made sure that my stdout captured string contains the escape character otherwise the line

data = os.read(self.pipe_out, 1)  # Read One Byte Only

in the while loop waits forever.

One other thing is the usage. Make sure the object of the OutputGrabber class is a local variable. If you use a global object or class attribute (such as self.out = OutputGrabber()) you will run into trouble when recreating it.

That's all. Again thank you!

yvesined
  • 21
  • 3
1

Use a pipe, i.e. os.pipe. You need to os.dup2 it before calling your library

Basile Starynkevitch
  • 1
  • 16
  • 251
  • 479
1

For anyone who came here from google to find how to suppress stderr/stdout output from shared library (dll), just as me, I post next simple context manager based on Adam's answer:

class SuppressStream(object): 

    def __init__(self, stream=sys.stderr):
        self.orig_stream_fileno = stream.fileno()

    def __enter__(self):
        self.orig_stream_dup = os.dup(self.orig_stream_fileno)
        self.devnull = open(os.devnull, 'w')
        os.dup2(self.devnull.fileno(), self.orig_stream_fileno)

    def __exit__(self, type, value, traceback):
        os.close(self.orig_stream_fileno)
        os.dup2(self.orig_stream_dup, self.orig_stream_fileno)
        os.close(self.orig_stream_dup)
        self.devnull.close()

Usage (adapted Adam's example):

import ctypes
import sys
print('Start')

liba = ctypes.cdll.LoadLibrary('libtest.so')

with SuppressStream(sys.stdout):
    liba.hello()  # Call into the shared library

print('End')
Sergei
  • 1,239
  • 10
  • 19
  • 1
    btw `as guard` looks unnecessary since it is just `None` – Azat Ibrakov Jan 11 '20 at 19:58
  • Works great, but is only loosely related to this question. IMHO you should delete and repost this answer here: https://stackoverflow.com/q/5081657/837710 That would have helped me find it much earlier. Comment if you do and I'll upvote there. The other answers there seem to be for Python 2, so this would be a valuable contribution. – Casey Jones Jun 15 '20 at 18:44
1

More simply, the Py library has a StdCaptureFD that catches streams file descriptors, which allows to catch output from C/C++ extension modules (in a similar mechanism than the other answers). Note that the library is said to be in maintenance only.

>>> import py, sys
>>> capture = py.io.StdCaptureFD(out=False, in_=False)
>>> sys.stderr.write("world")
>>> out,err = capture.reset()
>>> err
'world'

Another solution is worth noting that if you're in a pytest test fixture, you can directly use capfd, see these docs.

While the other answers may also work well, I ran into an error when using their code within PyCharm IDE (io.UnsupportedOperation: fileno), while StdCaptureFD worked fine.

CharlesB
  • 75,315
  • 26
  • 174
  • 199
  • 1
    This worked splendidly for a simple mute of Clib with unwieldy logging to terminal. py package is a bit larger than I would like - so advise to import py.io – Elysiumplain May 04 '21 at 19:47
0

It's basically untenable to capture the stdout from library code because that depends on your code running in an environment where a.) you're on a shell and b.) there's no other content going to your stdout. While you can probably make something work under these constraints, if you intend to deploy this code in any sense at all there is just no way to reasonably guarantee consistent good behavior. In fact, it's pretty questionable that this library code prints to stdout in a way that can't be controlled anyways.

So that's what you can't do. What you can do is to wrap any printing calls to this library inside something you can execute in a subprocess. Using Python's subprocess.check_output you can then get the stdout from that subprocess back in your program. Slow, messy, kinda grody all around, but on the other hand the library you're using prints useful information to stdout and doesn't return it so...

colinro
  • 357
  • 3
  • 9