-1

I'm trying to build a regex that matches an expression that :

  • start with a string (in my example : <div)
  • ends with another string (in my example : </div>)
  • contains a searched string (in my example : searched string).

Around this searched string can be anything included spaces and newlines.

Parsing : <div class="testclass">random example text</div> <div id="testid">foo bar foo searched string foo bar</div>

Should match : <div id="testid">foo bar foo searched string foo bar</div>

The first <div> should not match, as it doesn't contain searched string

I've tried something like : ^(<div)(.|\s)*?(searched string)(.|\s)*?(</div>)$

But obviously it returns the whole tested expression as the (.|\s)*? part matches everything until it finds the searched string.

I want the RegEx to reject the <div class="testclass">random example text</div> part, as it does not contain searched string

Thanks for your help.

EDIT: I'm using sublime text 3 to perform this search, and for what I understand, it uses a custom proprietary regex engine, but I guess the logic could be similar to other languages like php.

Lucas Demea
  • 81
  • 2
  • 4

1 Answers1

1

Don't forget:

Parsing HTML with regex is a hard job HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable.


But, if you really want a regex, use:

<div[^>]*>(?:(?!</div>)[\s\S])*searched string(?:(?!</div)[\s\S])*</div>

Demo & explanation

Screenshot:

enter image description here

Toto
  • 83,193
  • 59
  • 77
  • 109
  • Thank you very much, that did the trick. Actually I don't think I can use a parser because the code i'm parsing is html mixed with other special tags (wordpress shortcodes) that I have to clean out. The code I wrote here is only a simplified example of what I'm trying to do, for the sake of readability. Not sure a parser could work in this situation, but if so I would be happy to learn which tool to use – Lucas Demea May 01 '20 at 14:05