3

I am running my Python script in Windows PowerShell, and the script should run another program using Popen, then pipe the output of that program (Mercurial, actually) for use in my script. I am getting an encoding error when I try to execute my script in PowerShell.

I am quite sure it is happening because Python is not using the correct encoding that PowerShell is using, when getting the output of the Popen call. The problem is that I don't know how to tell Python to use the correct encoding.


My script looks like

# -*- coding: utf-8 -*-
#... some imports
proc = Popen(["hg", "--cwd", self.path, "--encoding", "UTF-8"] + list(args), stdout=PIPE, stderr=PIPE)
#... other code

When I run this script on Linux, I have no problems whatsoever. I can also run the script in Windows 7 Home Premium 64-bit using PowerShell with no problems. The PowerShell in this Windows 7 is using the code page 850, that is, the output of chcp is 850 ("ibm850").

However, when I run the script in a Windows 7 Starter 32-bits using a PowerShell that has by default the encoding cp437 (chcp = 437), I get the following error from Python (version 2.7.2):

File "D:\Path\to\myscript.py", line 55, in hg_command
    proc = Popen(["hg", "--cwd", self.path, "--encoding", "UTF-8"] + list(args), stdout=PIPE, stderr=PIPE)
File "C:\Program files\Python27\lib\subprocess.py", line 679, in __init__
    errread, errwrite)
File "C:\Program files\Python27\lib\subprocess.py", line 852, in _execute_child
    args = list2cmdline(args)
File "C:\Program files\Python27\lib\subprocess.py", line 615, in list2cmdline
    return ''.join(result)
UnicodeDecodeError: 'utf8' codec cant decode byte 0xe3 in position 0: unexpected end of data

I have tried the following, with no success (i.e., the above error report stays the same):

  • Remove the line # -*- coding: utf-8 -*- from my script.
  • Remove the -- encoding UTF-8 option for running Mercurial through Popen in my script.
  • Change the encoding to chcp 850 in PowerShell before executing my script.
  • Many other miscellaneous Python hacks I've found in other Stack Overflow answers.

For my specific details, my whole source code is available here in BitBucket. hgapi.py is the script that gives the error.


UPDATE: The script is being called by this other script, which is setting the encoding like this

sys.setdefaultencoding("utf-8")

This line looks important, because if I comment it out, I get a different error:

UnicodeDecoreError: 'ascii' codec cant decode byte 0xe3 in position 0: ordinal not in range(128)
André Staltz
  • 11,297
  • 7
  • 44
  • 52
  • Do you have the same problem when using the [mercurial api](http://mercurial.selenic.com/wiki/MercurialApi)? Since you are using python it seems like a natural fit. – Burhan Khalid Apr 03 '12 at 12:30
  • 1
    The project used to use the mercurial internal api, but I switched to the command line api because that's the official stable one. The internal api is not supposed to be used except for extensions. – André Staltz Apr 03 '12 at 12:41
  • This looks more like a problem with the `args` array since the exception is raised in `list2cmdline`. Maybe `args` or `self.path` is a byte string instead of a Unicode string? – Philipp Apr 03 '12 at 23:13
  • On Windows, you generally want to use Unicode strings wherever possible. `hgapi.py` turns all string literals into Unicode literals (`from __future__ import unicode_literals`), and `hypergrasscore.py` should probably do the same. – Philipp Apr 03 '12 at 23:17
  • 1
    @Philipp, that makes sense. However, I have added `from __future__ import unicode_literals` and I still get the same error. In fact, I even included `u` before strings (`u'string'`) in my hypergrasscore.py script that calls hgapi.py, but no success. =/ – André Staltz Apr 04 '12 at 13:00

2 Answers2

2

Try to change the encoding to cp1252. Popen in Windows wants shell commands encoded as cp1252. This seems like a bug, and it also seems fixed in Python 3.X through the subprocess module: http://docs.python.org/library/subprocess.html

import subprocess
subprocess.Popen(["hg", "--cwd", self.path, "--encoding", "UTF-8"] + list(args), stdout=PIPE, stderr=PIPE)

update:

Your problem maybe can be solved through smart_str function of Django module.

Use this code:

from django.utils.encoding import smart_str, smart_unicode
# the cmd should contain sthe string with the commsnd that you want to execute
smart_cmd = smart_str(cmd)
subprocess.Popen(smart_cmd)

You can find information on how to install Django on Windows here. You can first install pip and then you can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

This will install Django in your Python installation's site-packages directory.

Community
  • 1
  • 1
Thanasis Petsas
  • 3,970
  • 5
  • 27
  • 56
  • Change the encoding to cp1252 through what? `chcp 1252` in PowerShell doesn't help. – André Staltz Apr 03 '12 at 12:09
  • @STALTZ Try `$OutputEncoding = [Console]::OutputEncoding` – Burhan Khalid Apr 03 '12 at 12:36
  • @STALTZ I am sorry maybe my updated answer can help you but you have to install a new module, the django. `smart_str` is a very helpful function. You can find more information here: http://www.saltycrane.com/blog/2008/11/python-unicodeencodeerror-ascii-codec-cant-encode-character/ – Thanasis Petsas Apr 03 '12 at 14:49
  • I haven't tried yet your Django solution, @ThanasisPetsas, because I think there should be some basic Python solution for this. Thanks anyway, if I don't find another solution I can eventually use Django. – André Staltz Apr 05 '12 at 08:05
1

After using from __future__ import unicode_literals I started getting the same error but in a different part of the code:

out, err = [x.decode("utf-8") for x in  proc.communicate()]

Gave the error

UnicodeDecodeError: 'utf8' codec cant decode byte 0xe3 in position 33 ....

Indeed, x was a byte string with \xe3 (which is ã in cp1252) included. So instead of using x.decode('utf-8'), I used x.decode('windows-1252') and that gave me no bugs. To support any kind of encoding, I ended up using x.decode(sys.stdout.encoding) instead. Problem solved.

And that was in Python 3.2.2 with the Windows 7 Starter computer, but Python 2.7 on the same computer also worked normally.

André Staltz
  • 11,297
  • 7
  • 44
  • 52
  • Interesting - in my case http://stackoverflow.com/questions/28101187/deal-with-unicode-usernames-in-python-mkdtemp `sys.stdout.encoding` is None - would locale.getpreferredencoding() do the trick ? – Mr_and_Mrs_D Jan 23 '15 at 15:11