So what's backtracking?
The engine comes to quantifiers that are greedy by default. Greedy modifiers matches all possible and backtracks by demand, allowing efficient matches,
as referenced by Greedy vs. Reluctant vs. Possessive Quantifiers:
A greedy quantifier first matches as much as possible. So the .*
matches the entire string. Then the matcher tries to match the f
following, but there are no characters left. So it "backtracks", making the greedy quantifier match one less thing (leaving the "o" at the end of the string unmatched). That still doesn't match the f
in the regex, so it "backtracks" one more step, making the greedy quantifier match one less thing again (leaving the "oo" at the end of the string unmatched). That still doesn't match the f
in the regex, so it backtracks one more step (leaving the "foo" at the end of the string unmatched). Now, the matcher finally matches the f
in the regex, and the o
and the next o
are matched too. Success! [...]
What does this have to do with a*+b
?
In /a*+b/
:
a
The literal character "a".
-
*+
Zero or more, possessive.
b
The literal character "b".
As referenced by Greedy vs. Reluctant vs. Possessive Quantifiers:
A possessive quantifier is just like the greedy quantifier, but it doesn't backtrack. So it starts out with .*
matching the entire string, leaving nothing unmatched. Then there is nothing left for it to match with the f
in the regex. Since the possessive quantifier doesn't backtrack, the match fails there.
Why does it matter?
The machine won't realize if it's doing an (in)efficient match on its own. See here for a decent example: Program run forever when matching regex. In many scenarios, regexes written quickly may not be efficient and may well easily be problematic in deployment.
So what's an atomic group?
After the pattern within the atomic group finishes matching, it will not let go, ever. Study this example:
Pattern: (?>\d\w{2}|12)c
Matching string: 12c
Looks perfectly legitimate, but this match fails. The steps are simple: The first alternation of the atomic group matches perfectly - \d\w{2}
consumes 12c
. The group then completes its match - now here is our pointer location:
Pattern: (?>\d\w{2}|12)c
^
Matching string: 12c
^
The pattern advances. Now we try to match c
, but there is no c
. Instead of trying to backtrack (releasing \d\w{2}
and consuming 12
), the match fails.
Well that's a bad idea then! Why would we prevent backtracking, Unihedron?
Now imagine we're manipulating with a JSON object. This file is not small. Backtracking from the end is going to be a bad idea.
"2597401":[{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2- 5":{"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338497979",
"runTime":"1022",
"execType":"user:binary",
"exec":"ft.D.64",
"numNodes":"4",
"sha1":"5a79879235aa31b6a46e73b43879428e2a175db5",
"execEpoch":1336766742,
"execModify":"Fri May 11 15:05:42 2012",
"startTime":"Thu May 31 15:59:39 2012",
"numCores":"64",
"sizeT":{"bss":"1881400168","text":"239574","data":"22504"}},
{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338497946",
"runTime":"33" "execType":"user:binary",
"exec":"cg.C.64",
"numNodes":"4",
"sha1":"caf415e011e28b7e4e5b050fb61cbf71a62a9789",
"execEpoch":1336766735,
"execModify":"Fri May 11 15:05:35 2012",
"startTime":"Thu May 31 15:59:06 2012",
"numCores":"64",
"sizeT":{"bss":"29630984","text":"225749","data":"20360"}},
{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2-5": {"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338500447",
"runTime":"145",
"execType":"user:binary",
"exec":"mg.D.64",
"numNodes":"4",
"sha1":"173de32e1514ad097b1c051ec49c4eb240f2001f",
"execEpoch":1336766756,
"execModify":"Fri May 11 15:05:56 2012",
"startTime":"Thu May 31 16:40:47 2012",
"numCores":"64",
"sizeT":{"bss":"456954120","text":"426186","data":"22184"}},{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338499002",
"runTime":"1444",
"execType":"user:binary",
"exec":"lu.D.64",
"numNodes":"4",
"sha1":"c6dc16d25c2f23d2a3321d4feed16ab7e10c2cc1",
"execEpoch":1336766748,
"execModify":"Fri May 11 15:05:48 2012",
"startTime":"Thu May 31 16:16:42 2012",
"numCores":"64",
"sizeT":{"bss":"199850984","text":"474218","data":"27064"}}],
Uh oh...
Do you get what I mean now? :P
I'll leave you to figure out the rest, and try to find out more about possessive quantifiers and atomic groups; I'm not writing anything else into this post. Here is where the JSON came from, I saw the answer a few days ago, very inspiring: REGEX reformatting.
Read also