0

I'm trying to match everything after the first occurrence of a table element and before closing a body tag in an html document.

This is the regex that I tried with:

(?:<table>(.|\s)*<\/table>)((.|\s)*)(?:<\/body>)

I got no matches for this regex. I'm using JavaScript.

When I switch to PHP mode on regex101.com, I notice that this regex leads to catastrophic backtracking. I've isolated the section that breaks it, and apparently it's this:

<table>(.|\s)*<\/table>

What's wrong with this expression?

bhansa
  • 6,166
  • 2
  • 23
  • 42
David J.
  • 1,313
  • 7
  • 37
  • 69
  • 1
    `.` character matches everything – Mansuro Apr 19 '18 at 06:25
  • Since a closing table tag should not appear outside the body anyway, what is the point of worrying about a closing body tag? By the way, don't use regex to parse HTML. – Tim Biegeleisen Apr 19 '18 at 06:26
  • @TimBiegeleisen I don't understand your question. I'm not interested in what's inside of the first table, that's why it's a non-capturing group. Everything after that group and before the closing body tag is what I need to match. – David J. Apr 19 '18 at 06:32
  • What is wrong with using `.*` ?
    – Tim Biegeleisen Apr 19 '18 at 06:34
  • NEVER use `(.|\s)*` in the middle of the pattern. `.` can match line breaks in the latest Chrome versions if you use `s` modifier, but a more portable solution is to replace `.` with `[\s\S]` (it will work in any regex) or `[^]` (only works with ECMAScript). – Wiktor Stribiżew Apr 19 '18 at 06:34
  • @WiktorStribiżew Unfortunately, the highest-voted answer in your dupe recommends that syntax. I know you commented there a while ago, but he didn't fix the answer. – Barmar Apr 19 '18 at 06:40
  • @Barmar I do not care about high voted answers. There is an answer that provides all necessary details to solve the problem in many languages efficiently. – Wiktor Stribiżew Apr 19 '18 at 06:43

0 Answers0