1

I am trying to decode with python 3.8 a file that is in MIK-BULGARIAN encoding https://en.wikipedia.org/wiki/MIK_(character_set). It is an encoding that is identical with ASCII but bytes 128-191 are cyrilic letters. The file has both latin and cyrilic letters. My current solution works well but is rather slow with big files. Can you give me some suggestions how to speed things up (I know this is a lumberjack approach and I am open to suggestions).

def opener(filename):

    f = open(filename, "rb")
    filetext = f.read()
    cadText = translate(filetext)
    f.close()
    return cadText

mikdict = {
    128: "А",
    129: "Б",
    130: "В",
    131: "Г",
    132: "Д",
    ....
    188: "ь",
    189: "э",
    190: "ю",
    191: "я"
  }
def translate(textbytes):
    goodText = ""
    for txtbyte in textbytes:
        if (txtbyte >= 128) and (txtbyte <= 191):
            letter = str(mikdict.get(txtbyte))
        else:
            letter = chr(txtbyte)
        goodText = goodText + letter
  • See https://stackoverflow.com/questions/38777818/how-do-i-properly-create-custom-text-codecs – snakecharmerb Aug 14 '20 at 16:35
  • Does this answer your question? [str.translate gives TypeError - Translate takes one argument (2 given), worked in Python 2](https://stackoverflow.com/questions/23175809/str-translate-gives-typeerror-translate-takes-one-argument-2-given-worked-i) – JosefZ Aug 15 '20 at 10:30

1 Answers1

0

[code]Apparantly the right answer was to use map() and lambda, as it seems to be much more efficient than the my initial snippet.

def translate(input):
    newChars = map(lambda x: bytes([x]) if (x < 128) else bytes(mik.mikdict.get(x), "utf-8") if (x <= 191) and (x >= 128) else b"", input)
    res = b''.join(newChars).decode("utf-8")
return res