0

Example:

abc | efg || $something("arg 1", "arg 2||(a|b)") || 123

or without whitespace

abc|efg||$something("arg 1", "arg 2||(a|b)")||123

What RegEx pattern is needed so that I can get the following groups:

abc | efg
$something("arg 1", "arg 2||(a|b)")
123

Total of 3 groups.

Another example:

"abc || efg" || 123

Should give me 2 groups

"abc || efg"
123

Basically it cuts or splits the string by double pipes but excluding double pipes within double quotes.

My failed attempts are the following:

.+?(?=\|\|)|.*

\".+?\"|.+?(?=\|\|)|.*

supertonsky
  • 2,301
  • 4
  • 31
  • 58
  • By what do you wanna cut the string into pieces? – tung Jan 12 '15 at 08:16
  • The approach you wanna use is described here: **[Comments in strings, string in comments](http://stackoverflow.com/questions/25402109/regex-for-comments-in-strings-strings-in-comments-etc)**. Only you work with **_OR_-operator in strings, string between _OR_-operators**. But the general idea is the same. – asontu Jan 12 '15 at 08:17
  • Please stick with one splitting string. Do you want to split on un-quoted " || " or on un-quoted "||"? Each of your examples is different in that regard. – AlexR Jan 12 '15 at 08:47

5 Answers5

5

This is what I would do, regex-wise:

(?:^|\|\|)(?:(?!\|\|)(?!").|"(?:[^"\\]|\\.)*")*

Regex101 demo here. You can see the matches to the right, I put them in capture groups to omit the ||, you can get them with m.group(1) in Java. Java is not my forte but it should be something like this:

String s ="abc | efg || $something(\"arg 1\", \"arg 2||(a|b)\") || 123";   
String patternStr="(?:^|\\|\\|)(?:(?!\\|\\|)(?!\").|\"(?:[^\"\\\\]|\\\\.)*\")*";
Pattern p = Pattern.compile(patternStr);
Matcher m = p.matcher(s);
while (m.find()){
    System.out.println(m.group(1));
}

Edit: Realized looking back that you probably wanna accept "$something("arg with \" in it", "arg 2||(a|b)" so updated the regex to do that.

Added: Combining with Bohemian's solution, you could split on this if that's easier:

\|\|(?=(?:(?:(?:[^"\\]|\\.)*"){2})*[^"]*$)

Regex101 or in Java:

String[] parts = str.split("\\|\\|(?=(?:(?:(?:[^\"\\\\]|\\\\.)*\"){2})*[^\"]*$)");
asontu
  • 4,264
  • 1
  • 17
  • 25
  • 1
    @supertonsky I just changed my answer slightly, after you accepted it. It now handles escaped quotes like `"arg1 \" || something"` as 1 string and doesn't cut it off at the `||`. Would've been a nasty bug that only shows it self in very specific cases. – asontu Jan 12 '15 at 12:12
  • 1
    Why there's a need for nested non-capturing groups? Tried removing one of it and it seems to be still working. (?:^|\|\|)((?:"(?:[^"]|\")*"|(?:(?!\|\|).))*) – supertonsky Jan 12 '15 at 15:01
  • 1
    Ah, you're right, the one you deleted was unnecessary, as well the one around `(?!\\|\\|).`. It's a result of my regex dev-process :) I'll edit the answer. – asontu Jan 12 '15 at 15:51
  • Just simply awesome. Any links on how to develop that thought process on how to do regex "dev-process"? – supertonsky Jan 12 '15 at 15:57
  • A good starting point would be **[this SO answer](http://stackoverflow.com/questions/23589174/regex-pattern-to-match-excluding-when-except-between/23589204#23589204)**. I usually use mainly [debuggex.com](https://www.debuggex.com/) as it gives a visual "path". Unfortunately it has some bugs in the actual regex-handling (for this question specifically, how to handle escaped OR-operators: `\|`) so then I use the more reliable but less visually pleasing [regex101.com](https://regex101.com) – asontu Jan 12 '15 at 16:03
  • Oh, [www.regular-expressions.info](http://www.regular-expressions.info/) is also a really good tutorial and reference. More generic about regular expressions if that's what you were looking for. – asontu Jan 12 '15 at 16:16
  • It fails on this scenario: env("test") || env("another") – supertonsky Jan 16 '15 at 08:32
  • Updated the answer once more, should now work (regex101 demo updated as well) – asontu Jan 18 '15 at 22:22
1
\|\|(?=(?:[^"]*"[^"]*")*[^"]*$)

Split by this.See demo.

https://regex101.com/r/sH8aR8/47

vks
  • 63,206
  • 9
  • 78
  • 110
  • In Java the regex is a string an as such the `"` would have to be escaped as well. But clever way to accept multiple strings after the matched `||`. – asontu Jan 12 '15 at 08:52
  • @funkwurm dont have much idea about java :( – vks Jan 12 '15 at 08:53
0

If it is acceptable to not use split but instead go through multiple matches, you can use
(?<=\ \|\|\ |^)([^\"]+?(?:\"[^\"]*\")?)+?(?=\ \|\|\ |$) Explanation:

  1. Look-behind: " || " or start of line?
  2. Some non-quote text, as little as possible
  3. Optionally, a quote-enclosed block of non-quotes
  4. 2.-3. repeated at least once
  5. Look-ahead: " || " or end of line?

The matches will precisely be the results of a split by " || " with quoted || ignored.

AlexR
  • 2,274
  • 12
  • 24
0

Split on double pipes, but only those followed by an even number of quotes:

String[] parts = str.split("\\|\\|(?=(([^\"]*\"){2})*[^\"]*$");
Bohemian
  • 365,064
  • 84
  • 522
  • 658
0

Use this short regex \|\|(?!\([^\)]+\)) it may work.

Live demo

Ahosan Karim Asik
  • 2,995
  • 1
  • 16
  • 27
  • Breaks if there are more strings with `||` after each other: https://regex101.com/r/aA1mX9/2 – asontu Jan 12 '15 at 08:51