Here's a small example:
reg = ur"((?P<initial>[+\-])(?P<rest>.+?))$"
(In both cases the file has -*- coding: utf-8 -*-
)
In Python 2:
re.match(reg, u"hello").groupdict()
# => {u'initial': u'\ud83d', u'rest': u'\udc4dhello'}
# unicode why must you do this
Whereas, in Python 3:
re.match(reg, "hello").groupdict()
# => {'initial': '', 'rest': 'hello'}
The above behaviour is 100% perfect, but switching to Python 3 is currently not an option. What's the best way to replicate 3's results in 2, that works in both narrow and wide Python builds? The appears to be coming to me in the format "\ud83d\udc4d", which is what's making this tricky.