1

if you run this in python

print 'Äppleß','Äppleß'.decode('latin-1').encode('utf-8')

thats a mojibake (garbled text due to incorrect encoding/decoding) but python does not raise an error.

I want it to throw an error in the case of a mojibake.

I have heard of things like this that can help: https://ftfy.readthedocs.io/en/latest/#

any other ideas or experience on shortcuts?

John Machin
  • 75,436
  • 11
  • 125
  • 178
Fin Dev
  • 179
  • 8
  • 1
    You should look inside the code of ftfy to see how they do it. It seems they have the most advanced Mojibake corrector, and that it's easy to correct compared to other kinds of encoding mistakes: "Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string". – gaborous Feb 17 '17 at 14:28

0 Answers0