1

I need to extract words from this:

[[hello] hello [world] world]

Result:

[hello]
[world]

I tried:

\[(\[\w*\])+\]

But this expression for only one word, i need few words

HedgDifuse
  • 33
  • 1
  • 6
  • What have you tried, and what exactly is the problem with it? – jonrsharpe Jun 22 '20 at 18:25
  • @jonrsharpe i tried \[(\[\w*\])+\] but this expression for only one word between brackets – HedgDifuse Jun 22 '20 at 18:28
  • That would appear to work for your example. Do you not only want one word? Give a [mre] to explain the actual problem. – jonrsharpe Jun 22 '20 at 18:28
  • @jonrsharpe i dont know how create expression for few words – HedgDifuse Jun 22 '20 at 18:29
  • Why not `\[[^[\]]+?\]`? Or are you not specifying the `g` modifier, which would give you *all* matches? – trincot Jun 22 '20 at 18:33
  • @trincot no, need expression to extract words inside brackets only in other brackets – HedgDifuse Jun 22 '20 at 18:36
  • @WiktorStribiżew, that referenced Q&A does not deal with the additional requirement of the outer brackets. – trincot Jun 22 '20 at 18:47
  • @trincot Ok, I doubled checked, but note you do not need lazy quantifiers in your solution. – Wiktor Stribiżew Jun 22 '20 at 18:52
  • Indeed, they are a remnant of an earlier version. Will update. Thanks. – trincot Jun 22 '20 at 18:56
  • I assume that may be part of a string (e.g., `"Say [hello] or [[hello] hello [world] world] or both"`). Correct? – Cary Swoveland Jun 22 '20 at 20:15
  • when yuio say `need expression to extract words inside brackets only in other brackets` yuio immediatly referr to _balanced text_. why use a regex at all, or even try unless using the right engine that does this, you don't say, yes ? but look yuio got 2 attempts, both fail, why ? –  Jun 22 '20 at 20:21

3 Answers3

0

If I understand correctly you want to match content in square brackets, which itself occurs within a bracket pair.

I would suggest using look-ahead, skipping any other stand-alone bracket pair, and looking for an additional closing bracket. If that is found, then the match counts:

\[[^[\]]+\](?=[^[\]]*(\[[^[\]]+\])*[^[\]]*\])

trincot
  • 211,288
  • 25
  • 175
  • 211
  • Awesome! Thanks you! – HedgDifuse Jun 22 '20 at 18:44
  • not a balanced text regexr, if it only care about the right side how can it match between brackets, i mean right, yes ? why submit as an aqnsw4er ? matches '`[asfd]` ]' and wont match '[asfd] []]' even though it shuldnt anerway. –  Jun 22 '20 at 20:18
  • Indeed, this answer assumes that the input has balanced brackets. – trincot Jun 22 '20 at 20:19
0

If 2 steps are ok for you, try this (corrected version):

String s = "[[hello1] hello [world1] world]";
String t = s.replaceAll("^\\[(.+)\\]$","$1");
String result = t.replaceAll("([^\\[\\]]+)(\\[|$)","\n$2");
System.out.println(t);
System.out.println(result);
0

Matches of the following regular expression will extract the strings of interest.

(?:^.*?\[|\G)[ \w+]*\K\[[ \w+]*\](?=[ \w+]*(?:\[[ \w+]*\][ \w+]*)*\])

Start your engine!

This should work with every regex engine that supports \G and \K (defined below), which includes PCRE (PHP), Perl, Ruby, Python's regex module, among others.

The regex engine performs the following operations.

(?:          : begin non-capture group
  ^          : match beginning of string
  .*?\[      : match 0+ characters, lazily, then '['
  |          : or
  \G         : assert position at the end of the previous match or the
               start of the string for the first match
)            : end non-capture group
[ \w+]*      : match 0+ characters in character class
\K           : forget everything matched so far and 
\[           : match '['
[ \w+]*      : match 0+ characters in character class
\]           : match ']'
(?=          : begin positive lookahead
  [ \w+]*    : match 0+ characters in character class
  (?:        : begin non-capture group
    \[       : match '['
    [ \w+]*  : match 0+ characters in character class
    \]       : match ']'
    [ \w+]*  : match 0+ characters in character class
  )          : end non-capture group
  *          : execute non-capture group 0+ times
  \]         : match ']'
)            : end positive lookahead

This asserts the parentheses are balanced between the enclosing brackets (but not everywhere in the string).

Cary Swoveland
  • 94,081
  • 5
  • 54
  • 87