The whole first regex is:
(?s)((?:\\\\.|[^\\\\{}]|[{](?:\\\\.|[^\\\\{}])*[}])*)\\}|.
First you should do away with java string escapes (e.g. \\
to mean \
). You get a regex:
(?s)((?:\\.|[^\\{}]|[{](?:\\.|[^\\{}])*[}])*)\}|.
First thing is (?s)
a DOTALL flag with makes .
match newlines.
Second thing to look at is top level structure. Since |
is an OR operator, with lowest precedence, it's:
(something)\} OR SINGLE ANY CHARACTER - DOT
So it will first try to match something ending with }
(since }
is a special character in regex it's prefaced with \
. The part before }
will be matched as group 1 because of the ()
around it.
Let's look at what's inside the outermost ()
.
The outermost form is (?: something)*
. It will match 0 or more repetitions of something
.
The (?: )
means that what's inside is a non-capturing group, that is, it doesn't generate a group in match like ( )
would. It allows the |
OR expressions to correctly alternate with each other without including the outtermost |.
.
Let's look what that something
is. It's a series of OR expressions, which are tried from left to right.
First one is \\.
which matches \
followed by any character (notice \\
is escaped \
, while .
is not escaped.
The second one is a character class [\\{}]
which matches any character that is not \
or {
or }
.
Third one is matches character {
followed by 0 or more matches of the inner (?: )
followed by }
. Inner (?: )
matches either \
followed by any character or any character which is not \
or {
or }
.
So if you put this together this matches:
First part will match anything that ends with }
(group 1 will not include }
while whole match can. Before last }
it will match:
- Empty string
- Any characters escaped by
\
- Sequences of characters between
{
}
better explained as: it will match pretty much anything except \
by itself, {
}
without each other, it won't match nested {
}
pairs. Above exceptions can be escaped by \
.
It will also match any character at all (the last .
) but that match will have empty group 1.
Samples of (java unescaped) strings that match:
a}
, h{ello}}
, h{\{ello}}
, x
, h{\\ello}}
, {}}
Seems like that regex is wrong since it won't match {}
but it will match }
and {}}
while being named BALANCED_TEXT
.