python3 custom encoding mik-bulgarian

Question

I am trying to decode with python 3.8 a file that is in MIK-BULGARIAN encoding https://en.wikipedia.org/wiki/MIK_(character_set). It is an encoding that is identical with ASCII but bytes 128-191 are cyrilic letters. The file has both latin and cyrilic letters. My current solution works well but is rather slow with big files. Can you give me some suggestions how to speed things up (I know this is a lumberjack approach and I am open to suggestions).

def opener(filename):

    f = open(filename, "rb")
    filetext = f.read()
    cadText = translate(filetext)
    f.close()
    return cadText

mikdict = {
    128: "А",
    129: "Б",
    130: "В",
    131: "Г",
    132: "Д",
    ....
    188: "ь",
    189: "э",
    190: "ю",
    191: "я"
  }
def translate(textbytes):
    goodText = ""
    for txtbyte in textbytes:
        if (txtbyte >= 128) and (txtbyte <= 191):
            letter = str(mikdict.get(txtbyte))
        else:
            letter = chr(txtbyte)
        goodText = goodText + letter

See https://stackoverflow.com/questions/38777818/how-do-i-properly-create-custom-text-codecs — snakecharmerb, Aug 14 '20 at 16:35
Does this answer your question? [str.translate gives TypeError - Translate takes one argument (2 given), worked in Python 2](https://stackoverflow.com/questions/23175809/str-translate-gives-typeerror-translate-takes-one-argument-2-given-worked-i) — JosefZ, Aug 15 '20 at 10:30

score 0 · Accepted Answer · answered Aug 19 '20 at 08:08

[code]Apparantly the right answer was to use map() and lambda, as it seems to be much more efficient than the my initial snippet.

def translate(input):
    newChars = map(lambda x: bytes([x]) if (x < 128) else bytes(mik.mikdict.get(x), "utf-8") if (x <= 191) and (x >= 128) else b"", input)
    res = b''.join(newChars).decode("utf-8")
return res

python3 custom encoding mik-bulgarian

1 Answers1