-2

I have an input string like this:

one `two three` four five `six` seven

where some parts can be wrapped by grave accent character (`). I want to match only these parts which are not wrapped by it, it is one, four five and seven in example (skip two three and six). I tryied to do it using lookaheads ((?<=) and (?=)) but it recognised four five group like two three and six. Is it possible to solve this problem using regex only, or I have to do it programmatically? (I'm using java 1.8)

Kirill
  • 5,718
  • 3
  • 33
  • 67
  • If performances are a concern, avoid the regex and implement a simple parser such as [this one](https://ideone.com/r9q41M). – Aaron Mar 09 '18 at 17:07

3 Answers3

1

If you are sure that there are no unclosed backticks, you could do this:

((?:\w| )+)(?=(?:[^`]*`[^`]*`)*[^`]*$)

This will match:

"one "
" four five "
" seven"

But it's a little bit expensive, because the lookahead that checks whether the number of backtics in the remaining part of line is divisible by 2 takes O(n^2) time to scan through the entire string.

Note that this works regardless of where the whitespace is, it really counts the backticks, it does not care about the relative position of the backticks. If you don't need this kind of robustness, @anubhava's answer is certainly more performant.

Demo: regex101.

Andrey Tyukin
  • 38,712
  • 4
  • 38
  • 75
1

You may use this regex using a lookahead and lookbehind:

(?<!`)\b\w+(?:\s+\w+)*\b(?!`)

RegEx Demo

Explanation:

- (?<!`): Negative Lookbehind to assert that we don't have ` at previous position
- \b\w+(?:\s+\w+)*\b: Match our text surrounded by word boundaries
- (?!`): Negative Lookahead to assert that we don't have ` at next position
anubhava
  • 664,788
  • 59
  • 469
  • 547
-1

I solve issues like this by specifying to exclude closing characters (in your case whitespace) like so:

`[^\s]+`
Lance Toth
  • 430
  • 3
  • 16