Removing escaped unicode sequence in a text file

Question

I have a text file with lots of unicode escaped sequence (of emojis by the way), for instance blablabla \uD83D\uDC4D\uD83C blablabla \uDFFC\uD83D\uDC4F\uD83C\uDFFD I'd like to remove it all, and get blablabla blablabla

Is there Any regex expression which would clean these considering that i use Notepad++?

Thanks.

StackOverflow is a community where you need to show your efforts before having answers. Follow [this guide](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) to start working on a solution. http://idownvotedbecau.se/noattempt/ — totok, Jul 07 '20 at 14:32

Exelian · Accepted Answer · 2020-07-07T14:38:31.707

0

I would suggest: \\u[0-9A-F]{4}\s?.

\\u escapes the slash and matches it and the u literal. [0-9A-F]{4} matches exactly 4 of these characters. Perhaps you should update it to also match length 2 characters depending on the actual text: \\u([0-9A-F]{4}|[0-9A-F]{2})\s?

The \s? matches zero or more whitespace so you don't end up with multiple consecutive whitespace characters.

edited Jul 07 '20 at 14:38

answered Jul 07 '20 at 14:31

Exelian

5,364
1
28
46

`[0-9A-f]` matches digits, uppercase letters from `A` to `Z`, lowercase letters from `a` to `f` and some characters that take place between `Z` and `a` , for example `[`,`]`, `^`... Have a look at an [ASCII table](http://www.asciitable.com/). – Toto Jul 07 '20 at 14:36
it works like a charm! Thanks a lot! ^^ – 8oris Jul 07 '20 at 14:36
Moreover it matches much more than emoji! – Toto Jul 07 '20 at 14:37
You're entirely correct. I made a typo, I intended it to be an uppercase F – Exelian Jul 07 '20 at 14:37

Removing escaped unicode sequence in a text file

1 Answers1