2

I have dictionary

a = {'age': '12\xa0', 'name': 'pks\xa0\xa0'}

I wanted to remove all Non ASCII characters and replace with spaces.

For Removing Non ASCII character in non-dict we are using

''.join([i if 32 < ord(i) < 126 else " " for i in a])

But how to use for dictionary. Any help would be appreciated.

Prashant
  • 312
  • 4
  • 20

4 Answers4

4

You don't need a list comprehension and ord just encode to ascii and ignore the errors:

In [106]: {key:value.encode('ascii',errors='ignore') for key, value in a.items()}
Out[106]: {'age': b'12', 'name': b'pks'}

If you want to replace with space here is an efficient way:

In [117]: def replace_nonascii(mydict):
              for key, value in a.items():
                  new = value.encode('ascii',errors='ignore')
                  yield key, new + b' ' * (len(value) - len(new))
   .....:         

In [118]: dict(replace_nonascii(a))
Out[118]: {'age': b'12 ', 'name': b'pks  '}
kasravnd
  • 94,640
  • 16
  • 137
  • 166
2

Building on the answer from this question, you can use re.sub, removing non-ASCII characters and replacing them with a space.

>>> import re
>>> {k : re.sub(r'[^\x00-\x7F]',' ', v) for k, v in a.items()}
{'age': '12 ', 'name': 'pks  '}

This should work on python-3.x (python) as well as python-2.x (pythoff).

cs95
  • 274,032
  • 76
  • 480
  • 537
  • in respect of keys if value is also a dictionary then how do it? – Prashant Jan 23 '18 at 14:46
  • @Prashant `{k : {k2 : re.sub(r'[^\x00-\x7F]',' ', v2) for k2, v2 in v.items()} for k, v in a.items()}` – cs95 Jan 23 '18 at 14:49
  • @Prashant It works superbly for me, and that regex filters out anything that is not ASCII. – cs95 Jan 23 '18 at 15:05
  • `a2 = {'a1':{'name':'pks/xa0/xa0', 'age':'12/xa0/xa0'},'a3':{'name':'kps/xa0/xa0', 'age':'23/xa0/xa0'}}` and i applied `ans ={k : {k2 : re.sub(r'[^\x00-\x7F]',' ', v2) for k2, v2 in v.items()} for k, v in a2.items()}` – Prashant Jan 23 '18 at 15:12
  • @Prashant Lol.. those are forward slashes, not backward-slash escape sequences... that IS ASCII, made of valid ascii characters. (See /xa0 vs \xa0) – cs95 Jan 23 '18 at 15:14
2

You can remove the non printable ascii chars like this; it applies the line of code you provided to replace non printable ascii by a white space, to each value in the dictionary:

def remove_non_printable_ascii(s):
    return ''.join([c if 32 < ord(c) < 127 else " " for c in s])

a = {'age': '12\xa0', 'name': 'pks\xa0\xa0'}

for k in a:
    a[k] = remove_non_printable_ascii(a[k])

a

output:

{'age': '12 ', 'name': 'pks  '}
Reblochon Masque
  • 30,767
  • 8
  • 43
  • 68
  • I like your answer (which is why I upvoted it). My initial answer did have the exact same thing but it was dv'd so I changed it a bit.. ha – cs95 Jan 23 '18 at 14:30
0

Iteration on dictionary with map can be used:

for k,v in a.items():
    a[k] = "".join(map(lambda c: c if 32<ord(c)<127 else " " , v))

print(a) give following output:

{'name': 'pks  ', 'age': '12 '}
rnso
  • 20,794
  • 19
  • 81
  • 167