-1

I have this Python2.7 script which works if LANG != 'C':

# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, unicode_literals, print_function

import os
import subprocess

import sys

print('LANG: {}'.format(os.environ['LANG']))
print('sys.getdefaultencoding(): {}'.format(sys.getdefaultencoding()))
print('sys.getfilesystemencoding(): {}'.format(sys.getfilesystemencoding()))
subprocess.check_call(['echo', 'Umlauts üöä'])

Call on linux shell:

user@host:~$ python src/execv-arg-2-must-contain-only-strings.py 
LANG: de_DE.UTF-8
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): UTF-8
Umlauts üöä

But this fails:

user@host:~$ LANG=C python src/execv-arg-2-must-contain-only-strings.py 
LANG: C
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): ANSI_X3.4-1968
Traceback (most recent call last):
  File "src/execv-arg-2-must-contain-only-strings.py", line 12, in <module>
    subprocess.check_call(['echo', 'Umlauts üöä'])
  File "/usr/lib/python2.7/subprocess.py", line 536, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 523, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
TypeError: execv() arg 2 must contain only strings

What can I do to make this script work on Python2.7 with LANG=C?

guettli
  • 26,461
  • 53
  • 224
  • 476
  • Try decoding your call argument to utf-8 (_unicode_), then encoding it to the default encoding, i.e. `subprocess.check_call(['echo', 'Umlauts üöä'.decode("utf-8").encode(sys.getdefaultencoding())])` . It's generally a bad idea to pass unicode data as arguments unless the subprocess/shell executes in an unicode environment. It's much safer to pass such data through a STDOUT pipe. – zwer Dec 06 '17 at 14:18
  • @zwer I guess you mean a STDIN pipe. But nevertheless, thank you for your comment. Why not write it as answer? – guettli Dec 06 '17 at 14:23
  • A matter of perspective, technically the pipe would stand between the caller's STDOUT and calee's STDIN ;) – zwer Dec 06 '17 at 14:33

2 Answers2

1

Use LANG=C.UTF-8 instead of LANG=C

user@host> LANG=C.UTF-8 python t.py
LANG: C.UTF-8
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): UTF-8
Umlauts üöä

:-)

guettli
  • 26,461
  • 53
  • 224
  • 476
0

I didn't post it as an answer as I don't have means of checking its correctness atm. but in principle, if you want to send data as a subprocess/shell argument you have to match the encoding of the said data (and then decode it back in the receiving subprocess) or Python won't know how to pack the argument.

So, if you're working with a utf-8 literal (as defined in your encoding header) and you want to send it to a subprocess, you should first decode it to native unicode object, then encode it to whatever is the system's encoding of the current environment, e.g.:

literal_argument = "Umlauts üöä"  # string literal
unicode_argument = literal_argument.decode("utf-8")  # unicode
encoded_argument = unicode_argument.encode(sys.getdefaultencoding())  # sys encoded

subprocess.check_call(['echo', encoded_argument])

While safer, it can still break on non-standard shells. Where possible use a STDIN pipe of your subprocess to pass data that is not suitable for your current shell as an argument - then you don't have to worry about different code pages as long as both processes agree on what encoding to use.

zwer
  • 21,687
  • 3
  • 33
  • 46