0

I have a list that looks like this:

name_list=['ramon del rio,georgina genes,jorge lópez']

And I want to create a byte array. To do this I am running the following code,

for i in name_list:
    name_list_bytes.append(list(map(lambda x: str.encode(x, "UTF-8"), i.split(','))))

print(name_list_bytes)

[b'ramon del rio', b'georgina genes', b'jorge l\xf3pez']

As you can see the name "jorge lópez" is transformed "to "jorge l\xf3pez". How can I overpass this transformation and transform the name correctly?

[EDIT]

I found out that python encode function has a 2nd argument that controls the characters and what should python do when those characters are present in the string.

for i in name_list:
    name_list_bytes.append(list(map(lambda x: str.encode(x, "ascii", "ignore"), i.split(','))))

print(name_list_bytes)

[b'ramon del rio', b'georgina genes', b'jorge lpez'] #removes the unknown asscii character.

The "ignore" arguments removes the ascii characters, although I am looking for replacing them with the proper value. I guess that the best way although tedious is to identify those characters and replace them by hand.

NikSp
  • 819
  • 5
  • 19
  • That's just how byte objects are printed. Note the `b` before the opening quotes. – Simon Crane Jul 04 '20 at 18:44
  • @SimonCrane Ok so it's correct? But the printed function yields that result? – NikSp Jul 04 '20 at 18:45
  • Yes. If you print `len(name_list_bytes[2])` you will see that it has the correct number of characters – Simon Crane Jul 04 '20 at 18:52
  • @SimonCrane No that's not true. I am not looking for the length. I am looking to get rid of the "\xf3" character... When I run ```print(name_list_bytes[2])``` I get the same output. – NikSp Jul 04 '20 at 18:56

1 Answers1

0

I found the unidecode package after looking at this question that perfectly does the correct replacement of non-ASCII characters. So due to duplication, the question is closed .

import unidecode
name_list=['ramon del rio,georgina genes,jorge lópez']
final_list=[]

for i in name_list:
    final_list.append(list(map(lambda x: str.encode(unidecode.unidecode(x)), i.split(','))))

final_list
[[b'ramon del rio', b'georgina genes', b'jorge lopez']]
NikSp
  • 819
  • 5
  • 19