-1

I have a JavaScript regex that matches emojies. How do I match the same characters using ngx.re.match(), which is part of the OpenResty library for the nginx web server.

This is the original regex for matching emojies in JS:

(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])
ikegami
  • 322,729
  • 15
  • 228
  • 466
Dzmitry
  • 87
  • 1
  • 1
  • 7

1 Answers1

1

In PCRE, \x{####} can be used as equivalent to JavaScript's \u####.

Secondly, the JavaScript pattern is meant to match against Code Points encoded using UTF-16. But since we'll be matching against the Code Points themselves in PCRE, we need to "decode" the surrogate pairs.

After making both of the changes, we get:

[\xA9\xAE\x{2000}-\x{3300}\x{1F000}-\x{1FBFF}]

(I don't know Lua, so I'll leave it to provide the string literal that produces this string.)

Note that your pattern matches more than what most people would considered emojis. Emojis are found in the "Emoticons" block. This block (currently) spans Code Points U+1F600 to U+1F64F. In Perl, you can use \p{Block=Emoticons} or even just \p{Emoticons} to match these, but these appear to be unsupported by PCRE. To match just emojis, you'd therefore use the following:

[\x{1F600}-\x{1F64F}]
ikegami
  • 322,729
  • 15
  • 228
  • 466