3

I've some python code, that contains recognition for some hindi text. I deliberately save it as UTF-8, but when I re-open it, the hindi symbols change to russian text(mostly) or simply a ?. The encoding too changes to OEM 866 which is cyrillic.

Here are the screenshots, (lines 90 and 98): enter image description here enter image description here

Because of this encoding change, my code too isn't running at all, as the regular expressions consider ? as a special input. So what should I do?

Community
  • 1
  • 1
Mooncrater
  • 2,587
  • 3
  • 20
  • 46
  • 2
    For precisely reasons like this, I would suggest keeping your Python files in ASCII plaintext and using Unicode codepoints to capture the symbols instead. I.e. `\u0905\u092d\u0940` == अभी – Daniel R. Livingston Jun 28 '18 at 21:06

1 Answers1

1

Encoding the script to utf-8-BOM would do the job. But BOM itself has problems of it's own. Basically, if you're using shebangs, then using BOM encoding would render the script uninterpretable.


EDIT: A notepad++ contributor on github rddim replied to the issue opened by me:

I can't reproduce this, because may be I have missed fonts. On 1st screen your file is in UTF-8 and the 2nd is in OEM-866. Check the state of Autodetect character encoding in Settings > Preferences... > MISC.. If it is enabled just disable it and try again. Also your Debug Information missed the info from ? > Debug Info...

Worked for me.

Mooncrater
  • 2,587
  • 3
  • 20
  • 46