2

I'm trying to capture all the phrases in quotes and in between parenthesis in the the example below:

body paragraph text (the "first phrase to capture" or the "second phrase to capture").

So the following should be the matches: "first phrase to capture" and "second phrase to capture". I'm trying to use the negative lookbehind as below but I get an error saying lookbehinds need to be zero-width. Is there another way to implement this using a regex?

(?<=\(.*)(".*?")(?=.*\))

Link to example.

Sam
  • 18,756
  • 2
  • 40
  • 65
rkp333
  • 221
  • 2
  • 9
  • 2
    As a heads up for future questions, it's always wise to specify what language you plan to use a regex in (and/or add it as a question tag). Regex is implemented different in most languages and some things won't work depending on the "flavor". – Sam Sep 23 '14 at 00:38

2 Answers2

2

Should be enough to use a lookahead. See if this does what you want:

"[^"(]*"(?=[^(]*\))
  • "[^"(]*" desired quoted parts
  • (?=[^(]*\)) lookahead to check if inside parenthesis

Example at regex101; Regex FAQ

Note that this fails on parenthesis inside quoted strings like @Sam commented.

Community
  • 1
  • 1
Jonny 5
  • 11,051
  • 2
  • 20
  • 42
  • 1
    Fails on parenthesis inside of the quotes, which may work for OP (in which case this will be simpler than mine): `test("foo(bar)")ing` – Sam Sep 23 '14 at 00:32
  • @Sam didn't see any flavor specified so sticked with the most simple could think of – Jonny 5 Sep 23 '14 at 00:34
  • Some reason I thought I saw PCRE, so mine relies on that. Mine still won't work for nested parenthesis `(foo ("test") bar "fail")`. Have a +1 for a simpler solution. – Sam Sep 23 '14 at 00:36
  • This works great when either 1) both parens are in the string or 2) only the closing paren is in the string. I still have some scenarios where the closing paren is on the next line: `"Don't match this" body paragraph text (the "first phrase to capture" or the "second phrase to capture" then wraps the line...` – rkp333 Jan 09 '15 at 20:28
  • @rkp333 It only works for balanced parenthesis. Possibly need to precheck the string, if there's an open without closing such as `\([^)]*$` and replace by `\0)` [see regex101](https://regex101.com/r/lI3aF7/1) – Jonny 5 Jan 09 '15 at 22:03
2

PCRE gives us nice access to tools like \G (match the end of the last match or the start of the string) and \K (discard matched items to the left) that make this doable:

(?:       (?# begin non-capturing-group)
  \(      (?# match start of the parenthesis)
 |        (?# OR)
  (?<!^)  (?# unless we are at the beginning of the string)
  \G      (?# start at the end of the last match)
)         (?# end non-capturing group)
[^)"]*    (?# match until end of the parenthesis or start of quote)
\K        (?# throw away everything to the left)
"([^"]*)" (?# capture 0+ characters inside double quotes)

Demo

Sam
  • 18,756
  • 2
  • 40
  • 65
  • Note that this will fail with nested parenthesis, since most regex engines can't "count" (I believe .NET has a way around this) to see how many parenthesis have been opened and if they are all closed yet. This means `(foo ("test") bar "fail")` will fail, since no parenthesis were opened since the last one was closed. – Sam Sep 23 '14 at 00:35
  • 1
    Also experimented with `\G` before I noticed no flavor was specified :P – Jonny 5 Sep 23 '14 at 00:39
  • Yeah, sorry, I am using Python so looks like the \G is not available there. – rkp333 Jan 09 '15 at 19:35