You really have two questions in one, here;
What do these two regular expressions do?
/<(.*?)>.*?</\1>/
A dot, '.', is an atom that matches any character. The asterisk after it means "match the previous atom as many times as possible." The combination is referred to as "greedy" as the dot matches anything and the '*' says - "just keep going" and so, without any other constraint or anchor, the combination 'eats' or matches the rest of the string. The question mark changes this behaviour from "greedy" to "stingy" - it will try to match as little as possible.
The round brackets - or parentheses - don't stipulate what to match - they are there to indicate that you want to "capture" whatever does match to a special variable called "$1" for the first pair of brackets, "$2" for the second and so on.
So this, - <(.*?)>
means, match an open angle bracket (or "less than"), then match anything (but take up as little as possible) and then match a closing angle (or "greater than"). The round brackets stipulate nothing about what to match - they just mean "Put whatever text is between the angle brackets into $1."
The \1
in the last part, <\/\1>
, is what's called a back reference - it means "whatever you captured in the first set of round brackets I want to match again right here". The \/
before it is an escaped forward slash - so what we are looking for here is a "tag" (text surrounded by angle brackets), some text and then a matching "closing tag" - ie the same text in angle brackets with a '/' out the front.
/<(.*?)>.*</\1>/
This does almost the same thing except it tries to take as much text as possible between the opening and closing tags.
my $a = '"helllo"++"world"';
...
print "b $1\n" if $a =~/(".*?")/; # "helllo"
...
print "d $1\n" if $a =~/(.*?)/;
why does d show nothing, and b is "helllo" not ""?
With (b), you are saying "I insist on a double quote (") at the start, then any text, and then I insist again on a closing double quote (")". Now, if you look at the text in $a, you can see all of the following start and end with a (") with some text inbetween;
"helllo"
"helllo"++"
"helllo"++"world"
Here's the main point - .*?
(stingy) means "I want the smallest one" - ie in this case the first, whereas, .*
(no '?' - greedy) means "I want the longest one" - ie in this case, the last.
With (d), there are no angle brackets or double quote characters that you stipulate at the start or end of the string - you are simply saying "match anything, (.*) but take as little as possible". So, the RE gave you nothing at all as that certainly is the smallest match that satisfies the criteria (ie no criteria! :-)