I need to extract words from this:
[[hello] hello [world] world]
Result:
[hello]
[world]
I tried:
\[(\[\w*\])+\]
But this expression for only one word, i need few words
I need to extract words from this:
[[hello] hello [world] world]
Result:
[hello]
[world]
I tried:
\[(\[\w*\])+\]
But this expression for only one word, i need few words
If I understand correctly you want to match content in square brackets, which itself occurs within a bracket pair.
I would suggest using look-ahead, skipping any other stand-alone bracket pair, and looking for an additional closing bracket. If that is found, then the match counts:
If 2 steps are ok for you, try this (corrected version):
String s = "[[hello1] hello [world1] world]";
String t = s.replaceAll("^\\[(.+)\\]$","$1");
String result = t.replaceAll("([^\\[\\]]+)(\\[|$)","\n$2");
System.out.println(t);
System.out.println(result);
Matches of the following regular expression will extract the strings of interest.
(?:^.*?\[|\G)[ \w+]*\K\[[ \w+]*\](?=[ \w+]*(?:\[[ \w+]*\][ \w+]*)*\])
This should work with every regex engine that supports \G
and \K
(defined below), which includes PCRE (PHP), Perl, Ruby, Python's regex module, among others.
The regex engine performs the following operations.
(?: : begin non-capture group
^ : match beginning of string
.*?\[ : match 0+ characters, lazily, then '['
| : or
\G : assert position at the end of the previous match or the
start of the string for the first match
) : end non-capture group
[ \w+]* : match 0+ characters in character class
\K : forget everything matched so far and
\[ : match '['
[ \w+]* : match 0+ characters in character class
\] : match ']'
(?= : begin positive lookahead
[ \w+]* : match 0+ characters in character class
(?: : begin non-capture group
\[ : match '['
[ \w+]* : match 0+ characters in character class
\] : match ']'
[ \w+]* : match 0+ characters in character class
) : end non-capture group
* : execute non-capture group 0+ times
\] : match ']'
) : end positive lookahead
This asserts the parentheses are balanced between the enclosing brackets (but not everywhere in the string).