-1

I have this regex which scans a text for the word very: (?i)(?:^|\W)(very)[\W$] which works. My goal is to upgrade it and avoid doing a match if very is within quotes, standalone or as part of a longer block.

Now, I have this other regex which is matching anything NOT inside curly quotes: (?<![\S"])([^"]+)(?![\S"]) which also works.

My problem is that I cannot seem to combine them. For example the string:

Fred Smith very loudly said yesterday at a press conference that fresh peas will "very, very defintely not" be served at the upcoming county fair. In this bit we have 3 instances of very but I'm only interested in matching the first one and ignore the whole Smith quotation.

Ken White
  • 117,855
  • 13
  • 197
  • 405
Romeo Mihalcea
  • 7,580
  • 8
  • 38
  • 82
  • Set a bounty for it. I'd like to see how to match a string that does not come after an odd number of quotation marks and before at least one quotation mark. – Aydin4ik Aug 17 '17 at 03:15

2 Answers2

0

What you describe is kind of tricky to handle with a regular expression. It's difficult to determine whether you are inside a quote. Your second regex is not effective as it only ignores the first very that is directly to the right of the quote and still matches the second one.

Drawing inspiration from this answer, that in turn references another answer that describes how to regex match a pattern unless ... I can capture the matches you want.

The basic idea is to use alternation | and match all the things you don't want and then finally match (and capture) what you do want in the final clause. Something like this:

"[^"]*"|(very)

We match quoted strings in the first clause but we don't capture them in a group and then we match (and capture) the word very in the second clause. You can find this match in the captured group. How you reference a captured group depends on your regex environment.

See this regex101 fiddle for a test case.

Matt
  • 3,397
  • 1
  • 12
  • 22
  • I see the words inside curly quotes still being matched. I need to ignore anything inside there. – Romeo Mihalcea Aug 17 '17 at 03:37
  • The way this approach works is with the capturing group. You actively match the quoted string, but you don't capture it (no capturing group), you only use a capturing group for `very` and then you can reference that. Referencing a capturing group depends on your regex environment but I'm not sure what you are using. – Matt Aug 17 '17 at 04:43
0

This regex

(?i)(?<!(((?<DELIMITER>[ \t\r\n\v\f]+)(")(?<FILLER>((?!").)*))))\bvery\b(?!(((?<FILLER2>((?!").)*)(")(?<DELIMITER2>[ \t\r\n\v\f]+))))

could work under two conditions:

  • your regex engine allows unlimited lookbehind
  • quotes are delimited by spaces

Try it on http://regexstorm.net/tester

leoinstack
  • 26
  • 3