3

I have a Python script that calls another Python script using subprocess.Popen. I know the called code always returns 10 ,which means it failed.

My problem is, the caller only reads 10 approximatively 75% of the time. The other 25% it reads 0 and mistakes the called program failure code as a success. Same command, same environment, apparently random occurences.

Environment: Python 2.7.10, Linux Redhat 6.4. The code presented here is a (very) simplified version but I can still reproduce the problem using it.

This is the called script, constant_return.py:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-

"""
Simplified called code
"""
import sys

if __name__ == "__main__":
    sys.exit(10)

This is the caller code:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-

"""
Simplified version of the calling code
"""

try:
    import sys
    import subprocess
    import threading

except Exception, eImp:
    print "Error while loading Python library : %s" % eImp
    sys.exit(100)


class BizarreProcessing(object):
    """
    Simplified caller class
    """

    def __init__(self):
        """
        Classic initialization
        """
        object.__init__(self)


    def logPipe(self, isStdOut_, process_):
        """
        Simplified log handler
        """
        try:
            if isStdOut_:
                output = process_.stdout
                logfile = open("./log_out.txt", "wb")
            else:
                output = process_.stderr
                logfile = open("./log_err.txt", "wb")

            #Read pipe content as long as the process is running
            while (process_.poll() == None):
                text = output.readline()
                if (text != '' and text.strip() != ''):
                    logfile.write(text)

        #When the process is finished, there might still be lines remaining in the pipe
            output.readlines()
            for oneline in output.readlines():
                if (oneline != None and oneline.strip() != ''):
                    logfile.write(text)
        finally:
            logfile.close()


    def startProcessing(self):
        """
        Launch process
        """

        # Simplified command line definition
        command = "/absolute/path/to/file/constant_return.py"

        # Execute command in a new process
        process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

        #Launch a thread to gather called programm stdout and stderr
        #This to avoid a deadlock with pipe filled and such
        stdoutTread = threading.Thread(target=self.logPipe, args=(True, process))
        stdoutTread.start()
        stderrThread = threading.Thread(target=self.logPipe, args=(False, process))
        stderrThread.start()

        #Wait for the end of the process and get process result
        stdoutTread.join()
        stderrThread.join()
        result = process.wait()

        print("returned code: " + str(result))

        #Send it back to the caller
        return (result)


#
# Main
#
if __name__ == "__main__":

    # Execute caller code
    processingInstance = BizarreProcessing()
    aResult = processingInstance.startProcessing()

    #Return the code
    sys.exit(aResult)

Here is what I type in bash to execute the caller script:

for res in {1..100}
do
    /path/to/caller/script.py
    echo $? >> /tmp/returncodelist.txt
done

It seems to be somehow connected to the way I read the called program outputs, because when I create the subprocess with process = subprocess.Popen(command, shell=True, stdout=sys.stdout, stderr=sys.stderr) and remove all the Thread stuff it reads the correct return code (but doesn't log as I want anymore...)

Any idea what I did wrong ?

Thanks a lot for your help

seb_fx
  • 33
  • 3
  • Is it some special python build? Why do you catch `Exception` while importing stdlib modules that should be present by default on redhat? – jfs Jul 21 '15 at 18:02
  • [this message on Python bug tracker might be related](http://bugs.python.org/msg142215) – jfs Jul 21 '15 at 18:20

1 Answers1

1

logPipe is also checking whether the process is alive to determine whether there's more data to read. This is not correct - you should be checking whether the pipe has reached EOF, by looking for a zero-length read, or by using output.readlines(). The I/O pipes may outlive the process.

This simplifies logPipe significantly: Change logPipe as below:

  def logPipe(self, isStdOut_, process_):
      """
      Simplified log handler
      """
      try:
          if isStdOut_:
              output = process_.stdout
              logfile = open("./log_out.txt", "wb")
          else:
              output = process_.stderr
              logfile = open("./log_err.txt", "wb")

          #Read pipe content as long as the process is running
          with output:
              for text in output:
                  if text.strip(): # ... checks if it's not an empty string
                      logfile.write(text)

      finally:
          logfile.close()

Second, don't join your logging threads until after process.wait(), for the same reason - the I/O pipes may outlive the process.

What I think is happening under the covers is that there's a SIGPIPE being emitted and mishandled somewhere - possibly being misconstrued as the process termination condition. This is because the pipe is being closed on one end or the other without being flushed. SIGPIPE can sometimes be a nuisance in larger applications; it may be that the Python library swallows it or does something childish with it.

edit As @Blackjack points out, SIGPIPE is automatically blocked by Python. So, that rules out SIGPIPE malfeasance. A second theory though: The documentation behind Popen.poll() states:

Check if child process has terminated. Set and return returncode attribute.

If you strace this (eg, strace -f -o strace.log ./caller.py), this appears to be being done via wait4(WNOHANG). You've got 2 threads waiting with WNOHANG and one waiting normally, but only one call will return correctly with the process exit code. If there is no lock in the implementation of subprocess.poll(), then there is quite likely a race to assign process.resultcode, or a potential failure to do so correctly. Limiting your Popen.waits/polls to a single thread should be a good way to avoid this. See man waitpid.

edit as an aside, if you can hold all your stdout/stderr data in memory, subprocess.communicate() is much easier to use and does not require the logPipe or background threads at all.

https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate

lyngvi
  • 1,222
  • 12
  • 19
  • The [documentation for the signal module](https://docs.python.org/2/library/signal.html) states: _„*SIGPIPE* is ignored (so write errors on pipes and sockets can be reported as ordinary Python exceptions)”_ – BlackJack Jul 21 '15 at 15:57
  • Ah. Well, that experiment won't be terribly interesting, then. Will remove that experimental recommendation. Changes my hypothesis a bit too. – lyngvi Jul 21 '15 at 16:00
  • Thank you lyngvi, you were right about your analysis. Removing the many `process.poll()` will indeed solve the return code problem, which means it was probably a race condition in the return code calculation. I just edited a bit the `logPipe` function you created, because it currently doesn't write anything in the log files. FYI the caller code is designed to call very long programs with a lot of logging and quite long execution times, so the `Popen.communicate` wouldn't fly here, I need something that doesn't clog the memory and that prints the messages when they are released to the pipe. – seb_fx Jul 21 '15 at 18:06
  • @lyngvi: please, use `for line in p.stdout:` instead of `for line in p.stdout.readlines():`. – jfs Jul 21 '15 at 18:12
  • @seb_fx: even better if you redirect the output to the files directly (pass file objects instead of PIPE). Or if you need to filter the lines in Python then see [Python: read streaming input from subprocess.communicate()](http://stackoverflow.com/q/2715847/4279) – jfs Jul 21 '15 at 18:12
  • Here yes, but writing directly to a file is a simplification of the log handler code, which should otherwise do a lot of things including a call to a previously defined `logger` object. All this was not necessary to understand the return code problem, so I removed it so my work would look clearer for external readers :) – seb_fx Jul 21 '15 at 18:20
  • @J.F.Sebastian done. And good recommendation re: passing file objects, even better than communicate() for this example. I sought to avoid changing the threading logic in case seb_fx was doing some brainier operation in his real implementation. – lyngvi Jul 21 '15 at 18:20
  • @seb_fx: I thought so that is why i've linked to another solution in the comment. Put whatever code you like instead of `print line,` there – jfs Jul 21 '15 at 18:23
  • @lyngvi: close `output`. `with output:` would work. I would pass filename in the parent thread and drop `isStdout_`. `with open(filename, 'wb') as file:`. `text != ''` is useless -- it is never happen. Use `if text.strip():` instead of `if text.strip() != '':` (as a side-effect, it will also work on Python 3). – jfs Jul 21 '15 at 18:25
  • @J.F.Sebastian: Used 'with output'. Personally I'd pass stdout or stderr as an argument along with the pre-initialized file-object, but didn't want to post a larger refactored blob. – lyngvi Jul 21 '15 at 18:34