0

I'm trying to find a (sed style) regex that will match every instance of the word "HAWK" and the closest surrounding item tags, i.e. <item> ... HAWK ... </item>, where the ellipsis may be text or other tags (but not the item tag).

So far I've tried lazy match-all: <item>(.*?)HAWK(.*?)<\/item>, and find that this works well for catching everything between HAWK and the closing item tag, but matches over many nested opening <item> tags, and so winds up getting too much.

I think using look-behind might help but I've had problems getting this to work properly also. Any help would be much appreciated.

Ryan
  • 61
  • 4

1 Answers1

0

In order to find the "closest tag" using sed style expressions, you can try

/<item>[^>]*HAWK[^<]*<\/item>\)/

This works in 'regular sed'. Basically, by matching anything that is not a closing bracket before HAWK, and anything that is not an opening bracket after HAWK, it 'simulates' the non-greedy quantifier .*?.

But it is usually not a good idea to try to parse XML etc with regex - a 'real parser' is much more robust. But hey, you asked.

Floris
  • 43,828
  • 5
  • 63
  • 112