-4

I want to remove words in string if it contains \u in python?

ex:

string ="\uf064thickness cfoutside\uf0d7\uf03a\uf03d TC2019 45TRCMat"

The final output should be like this.

"TC2019 45TRCMat"

After removing all of the words if it contains \u.

Dishin H Goyani
  • 5,636
  • 3
  • 18
  • 29
domahc
  • 229
  • 1
  • 6
  • 3
    what have you tried ? – Linh Nguyen Dec 16 '19 at 07:37
  • Aside from the above? What data type is that final output? Is it always the last 2 words? What if those final strings contain those literals instead of unicode? [ask] – Sayse Dec 16 '19 at 07:40
  • I'm really new to python. I tried to use regex. But could not able to get above output – domahc Dec 16 '19 at 07:42
  • There is no `"\u"` in the string. There is `"\uf064"` for example, but that is the representation for only one unicode character. You can check it with `len("\uf064")`. – Matthias Dec 16 '19 at 07:44
  • you can read more about split() and replace() , those should be in every starting tutorial or documentation of python – Linh Nguyen Dec 16 '19 at 07:44
  • No, it not always last 2 words. its a string that contains huge set of words(that contains words with \u sign and without it). Just want to extract words if it doesn't contain \u sign – domahc Dec 16 '19 at 07:45
  • Remove \u sign from the string is also fine.I tried to use string.replace('\u'," ").But it gives me an error SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape – domahc Dec 16 '19 at 07:47
  • us can use `string.split()[-2:]` (output is a list) / `' '.join(string.split()[-2:])`(output is a string) if you want is always the last to elements – Shijith Dec 16 '19 at 07:49

1 Answers1

1

Rather then looking to remove unicode character go the other way and only allow ascii character:

string ="\uf064thickness cfoutside\uf0d7\uf03a\uf03d TC2019 45TRCMat"

def is_ascii(s):
    return all(ord(c) < 128 for c in s)

for s in string.split(" "):
    if is_ascii(s):
        print(s)

Reference: How to check if a string in Python is in ASCII?

Boendal
  • 2,421
  • 1
  • 19
  • 32