How do I resolve the 'utf-8' codec for stripping carriage returns using tf.gfile.GFile?

Question

I'm using this:

label_lines = [line.rstrip() for line in tf.gfile.GFile(path2)]

which throws me this error:

  File "C:\Python\lib\site-packages\tensorflow\python\util\compat.py", line 88, in as_text
    return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 0: invalid start byte

score 0 · Answer 1 · answered May 25 '18 at 17:47

You've not really provided enough information to know exactly, but it is likely that the GFile thingy is opening the file specified by path. If that is correct, then the error is likely along the following lines:

GFile is probably opening the file in "text mode". The definition of "text" is that the bytes in the file have to be converted into "text" using some rules.
You have not specified what those rules are, so GFile is applying a default, which appears to be "utf-8". For reference, the Python docs talk about what the standard library does.
However, the content of the file is not "utf-8" compatible.

Since the file at least contains the byte 0xbb, that does not look like vanilla text. Are you sure the file contains text? At any rate, it is hard to speculate on a fix without you providing more details on the exact form of the content of the file, and the arguments available on GFile.

BTW, I notice that 0xbb is part of a BOM, though not the first character. Some applications on Windows do use BOMs at the start of text files. So if you think it is text, perhaps it is text preceded by a BOM? There are other answers on SO that might help in such cases.

Sorry, that means nothing to me. Did you try the other suggestions? — Shaheed Haque, May 26 '18 at 10:39

score 0 · Answer 2 · answered May 27 '18 at 14:59

0

Well, the .pb was accidentally associated with ‘notebook’. Once I removed the open with association, the problem went away. I’m not sure how or why, but problem solved.

answered May 27 '18 at 14:59

lh47383

1
1

How do I resolve the 'utf-8' codec for stripping carriage returns using tf.gfile.GFile?

2 Answers2