-3

I have a text file with lots of unicode escaped sequence (of emojis by the way), for instance blablabla \uD83D\uDC4D\uD83C blablabla \uDFFC\uD83D\uDC4F\uD83C\uDFFD I'd like to remove it all, and get blablabla blablabla

Is there Any regex expression which would clean these considering that i use Notepad++?

Thanks.

8oris
  • 77
  • 5
  • StackOverflow is a community where you need to show your efforts before having answers. Follow [this guide](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) to start working on a solution. http://idownvotedbecau.se/noattempt/ – totok Jul 07 '20 at 14:32

1 Answers1

0

I would suggest: \\u[0-9A-F]{4}\s?.

\\u escapes the slash and matches it and the u literal. [0-9A-F]{4} matches exactly 4 of these characters. Perhaps you should update it to also match length 2 characters depending on the actual text: \\u([0-9A-F]{4}|[0-9A-F]{2})\s?

The \s? matches zero or more whitespace so you don't end up with multiple consecutive whitespace characters.

Exelian
  • 5,364
  • 1
  • 28
  • 46
  • `[0-9A-f]` matches digits, uppercase letters from `A` to `Z`, lowercase letters from `a` to `f` and some characters that take place between `Z` and `a` , for example `[`,`]`, `^`... Have a look at an [ASCII table](http://www.asciitable.com/). – Toto Jul 07 '20 at 14:36
  • it works like a charm! Thanks a lot! ^^ – 8oris Jul 07 '20 at 14:36
  • Moreover it matches much more than emoji! – Toto Jul 07 '20 at 14:37
  • You're entirely correct. I made a typo, I intended it to be an uppercase F – Exelian Jul 07 '20 at 14:37