I have following test sute for TCL:
set v z[string repeat t 10000]b[string repeat t 10000]g[string repeat t 10000]z
If I use regexp with just match mode - its ok:
time {regexp {z.*?b.*?g(.+?)z} $v} 20
[TCL_OK] 340.4 microseconds per iteration
But if I want to get submatch, regexp apply dramatically slow:
time {regexp {z.*?b.*?g(.+?)z} $v -> asd} 5
[TCL_OK] 157007.4 microseconds per iteration
What problem with my regexp and why regexp apply too slow only with submatch return mode?
I using following environment:
parray tcl_platform
tcl_platform(byteOrder) = littleEndian
tcl_platform(machine) = intel
tcl_platform(os) = Windows NT
tcl_platform(osVersion) = 6.1
tcl_platform(pathSeparator) = ;
tcl_platform(platform) = windows
tcl_platform(pointerSize) = 4
tcl_platform(threaded) = 1
tcl_platform(user) = kot
tcl_platform(wordSize) = 4
[TCL_OK]
puts $tcl_patchLevel
8.6.0
[TCL_OK]
Update. Additional tests:
Non-capture match - time best:
time {regexp {z.*?b.*?g(.+?)z} $v} 5
[TCL_OK] 1178.2 microseconds per iteration
Capture-match, non-greedy - time bad:
time {regexp {z.*?b.*?g(.+?)z} $v -> asd} 5
[TCL_OK] 13796072.4 microseconds per iteration
Capture-match, greedy - time ok:
time {regexp {z.*b.*g(.+)z} $v -> asd} 5
[TCL_OK] 7097.4 microseconds per iteration
string length $asd
[TCL_OK] 100007
Capture-match, non-greedy+greedy+greedy - time very bad:
time {regexp {z.*?b.*g(.+)z} $v -> asd} 5
[TCL_OK] 38177041.6 microseconds per iteration
string length $asd
[TCL_OK] 100000
And finally, capture-match, non-greedy+non-greedy+greedy - match is non-greedy and time is ok:
time {regexp {z.*?b.*?g(.+)z} $v -> asd} 5
[TCL_OK] 4157.0 microseconds per iteration
string length $asd
[TCL_OK] 100000
Tcl's RE engine work very unpredictable for me.