I am trying to decode with python 3.8 a file that is in MIK-BULGARIAN encoding https://en.wikipedia.org/wiki/MIK_(character_set). It is an encoding that is identical with ASCII but bytes 128-191 are cyrilic letters. The file has both latin and cyrilic letters. My current solution works well but is rather slow with big files. Can you give me some suggestions how to speed things up (I know this is a lumberjack approach and I am open to suggestions).
def opener(filename):
f = open(filename, "rb")
filetext = f.read()
cadText = translate(filetext)
f.close()
return cadText
mikdict = {
128: "А",
129: "Б",
130: "В",
131: "Г",
132: "Д",
....
188: "ь",
189: "э",
190: "ю",
191: "я"
}
def translate(textbytes):
goodText = ""
for txtbyte in textbytes:
if (txtbyte >= 128) and (txtbyte <= 191):
letter = str(mikdict.get(txtbyte))
else:
letter = chr(txtbyte)
goodText = goodText + letter