Fixed Length Regex Required?

Question

I have this regex that uses forward and backward look-aheads:

import re
re.compile("<!inc\((?=.*?\)!>)|(?<=<!inc\(.*?)\)!>")

I'm trying to port it from C# to Python but keep getting the error

look-behind requires fixed-width pattern

Is it possible to rewrite this in Python without losing meaning?

The idea is for it to match something like

<!inc(C:\My Documents\file.jpg)!>

Update

I'm using the lookarounds to parse HTTP multipart text that I've modified

body = r"""------abc
Content-Disposition: form-data; name="upfile"; filename="file.txt"
Content-Type: text/plain

<!inc(C:\Temp\file.txt)!>
------abc
Content-Disposition: form-data; name="upfile2"; filename="pic.png"
Content-Type: image/png

<!inc(C:\Temp\pic.png)!>
------abc
Content-Disposition: form-data; name="note"

this is a note
------abc--
"""

multiparts = re.compile(...).split(body)

I want to just get the file path and other text when I do the split and not have to remove the opening and closing tags

Code brevity is important, but I'm open to changing the <!inc( format if it makes the regex doable.

Have you tried using a raw string? `re.compile(r'''regex here''')` — C0deH4cker, Jun 25 '12 at 21:31
You can use the [regex module](http://pypi.python.org/pypi/regex) instead of the standard re, which does support variable-length lookbehinds. — georg, Jun 25 '12 at 21:36
Apparently you are looking for the `` parts. Why are you not looking for the file part with `(?<=)`? — Olivier Jacot-Descombes, Jun 25 '12 at 21:36
@thg435 - Sorry, this will be used on other's computers and I don't want to have to distribute an additional module if possible. Thanks. — Chad, Jun 25 '12 at 22:10
@OlivierJacot-Descombes - See my update, sorry for not explaining better in the first place. — Chad, Jun 25 '12 at 22:10
@C0deH4cker - Thanks for the tip, that will make the code nicer to look at! — Chad, Jun 25 '12 at 22:14
You want to **capture** the file path, and everything except the opening `` tags? — ohaal, Jun 25 '12 at 22:17
@ohaal - Yes, exactly and I want them to be split so I have an array of "everythings" and "file paths"... and no opening or closing tags. — Chad, Jun 26 '12 at 01:55

score 4 · Answer 1 · answered Jun 25 '12 at 21:33

From the documentation:

(?<!...)

Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.

(?<=...)

Matches if the current position in the string is preceded by a match for ... that ends at the current position. This is called a positive lookbehind assertion. (?<=abc)def will find a match in abcdef, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not. Note that patterns which start with positive lookbehind assertions will not match at the beginning of the string being searched; you will most likely want to use the search() function rather than the match() function:

Emphasis mine. No, I don't imagine you can port it to Python in it's current form.

Yeah, I read the documentation and was hoping someone on SO is smart enough to help me rewrite this without the lookarounds since the documentation says they're not allowed. Thanks! — Chad, Jun 25 '12 at 22:12
This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Lookarounds". — aliteralmind, Apr 10 '14 at 00:30

ohaal · Accepted Answer · 2012-06-26T08:06:57.177

3

For paths + "everything" in the same array, just split on the opening and closing tag:

import re
p = re.compile(r'''<!inc\(|\)!>''')
awesome = p.split(body)

You say you're flexible on the closing tags, if )!> can occur elsewhere in the code, you may want to consider changing that closing tag to something like )!/inc> (or anything, as long as it's unique).

See it run.

edited Jun 26 '12 at 08:06

answered Jun 25 '12 at 21:33

ohaal

5,050
2
31
48

+1 :: Optionally replace `.*?` with `.+?` for non-blank inside match – Ωmega Jun 25 '12 at 21:46
@user1215106: That wouldn't match his already existing regex. Keep in mind this is a port from C# to Python. – ohaal Jun 25 '12 at 21:47
That's why I wrote **optionally** and explain what would change, Sir. – Ωmega Jun 25 '12 at 21:48
BTW :: For better performance, don't use `*?` or `+?` at all, if you don't have to... – Ωmega Jun 25 '12 at 21:49
Just google for that - for example: http://blog.stevenlevithan.com/archives/greedy-lazy-performance – Ωmega Jun 25 '12 at 21:56
Sorry - I should have explained better - see my updated question. Thanks! – Chad Jun 25 '12 at 22:06
Yes, I think I'll take this approach. It does lose some accuracy since it doesn't verify the matching end tags, but I'm not too worried about that being a problem. Thanks! – Chad Jun 26 '12 at 21:19

Hugh Bothwell · Answer 3 · 2012-06-25T23:27:29.747

1

import re

pat = re.compile("\<\!inc\((.*?)\)\!\>")

f = pat.match(r"<!inc(C:\My Documents\file.jpg)!>").group(1)

results in f == 'C:\My Documents\file.jpg'

In response to Jon Clements:

print re.escape("<!inc(filename)!>")

results in

\<\!inc\(filename\)\!\>

Conclusion: re.escape seems to think they should be escaped.

edited Jun 25 '12 at 23:27

answered Jun 25 '12 at 21:36

Hugh Bothwell

50,702
6
75
95

Any reason to escape `` ? The compile statement should traditionally be an r'' str – Jon Clements Jun 25 '12 at 21:49

Fixed Length Regex Required?

3 Answers3

Linked