I have sifted through lots and lots of python/unicode explanations but I just can't seem to make sense of this.
Here is the situation:
I am pulling loads of comments off reddit (making a bot) and would like to primarily store them in a MongoDB, but also need to be able to print out comment trees in order to manually check what's going on.
I have had no problems so far putting comments into the DB, but when I try to print to stdout the CP1252 charset is having trouble with characters that it obviously doesn't support.
As I have read, in Python 3 everything internally (strings) are stored as Unicode, it's the input and output which must be bytes, so this is fine - I can encode the unicode to CP1252 and in a couple of situations I will see \x** characters which I don't mind - I am guessing they represent out of range characters?
The problem is I was printing out comment trees (to stdout) using \n (linefeeds) and tabs so it was easy to look over, but apparently when you encode a unicode string with newline escape sequences it escapes them so they get printed as literals.
For reference here is my encode statement:
encoded = post.tree_to_string().encode('cp1252','ignore')
Thanks
EDIT:
What I want is
|Parent Comment
|Child comment 1
|GChild comment 1
|Child comment 2
|Parent Comment 2
What I get is
b"\n|Parent comment \n\n |Child comment \n\n etc