59

Is there a way in PHP to compile a regular expression, so that it can then be compared to multiple strings without repeating the compilation process? Other major languages can do this -- Java, C#, Python, Javascript, etc.

Preston
  • 1,543
  • 1
  • 11
  • 13

5 Answers5

45

The Perl-Compatible Regular Expressions library may have already be optimized for your use case without providing a Regex class like other languages do:

This extension maintains a global per-thread cache of compiled regular expressions (up to 4096).

PCRE Introduction

This is how the study modifier which Imran described can store the compiled expression between calls.

Community
  • 1
  • 1
18

preg regexes can use the uppercase S (study) modifier, which is probably the thing you're looking for.

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

S

When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.

Imran
  • 76,055
  • 23
  • 93
  • 124
  • 10
    The answer to the OP's question is that there's no need to pre-compile regexes in PHP because, as 1stvamp noted, compiled regexes are cached automatically. The 'S' modifier is a separate issue. – Alan Moore Apr 07 '09 at 03:39
  • This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Modifiers". – aliteralmind Apr 10 '14 at 00:41
12

Thread is the thread that the script is currently running in. After first use, compiled regexp is cached and next time it is used PHP does not compile it again.

Simple test:

<?php

function microtime_float() {
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}

// test string
$text='The big brown <b>fox</b> jumped over a lazy <b>cat</b>';
$testTimes=10;


$avg=0;
for ($x=0; $x<$testTimes; $x++)
{
    $start=microtime_float();
    for ($i=0; $i<10000; $i++) {
        preg_match_all('/<b>(.*)<\/b>0?/', $text, $m);
    }
    $end=microtime_float();
    $avg += (float)$end-$start;
}

echo 'Regexp with caching avg '.($avg/$testTimes);

// regexp without caching
$avg=0;
for ($x=0; $x<$testTimes; $x++)
{
    $start=microtime_float();
    for ($i=0; $i<10000; $i++) {
        $pattern='/<b>(.*)<\/b>'.$i.'?/';
        preg_match_all($pattern, $text, $m);
    }
    $end=microtime_float();
    $avg += (float)$end-$start;
}

echo '<br/>Regexp without caching avg '.($avg/$testTimes);

Regexp with caching avg 0.1 Regexp without caching avg 0.8

Caching a regexp makes it 8 times faster!

Mike
  • 121
  • 1
  • 2
  • 2
    **Test is NUL**! Because: you're concatenating 3 strings in your 2nd example (without caching) while in the 1st the 'variable' `$i` does not exist in the pattern and it's always `0` in that place – CSᵠ Nov 08 '14 at 03:38
  • 2
    Test is **reasonably valid** nonetheless. By concatenating a string "$j-$y" with $j = 37 and $y = 5 in the first test, and a string "$i-$x" in the second (the -$x is to defeat any caching by testTimes), I get times of 0.0112 and 0.0431. The same 0.0431 is obtained by using "$i-$y" in the second test, which means that indeed the cache is less than 10000 in size. My actual speedup is thus **4 times faster** (not 8). – LSerni Oct 29 '15 at 08:44
7

As another commenter has already said, PCRE regexes are already compiled without your having to specifically reference them as such, PCRE keeps an internal hash indexed by the original string you provided.

1stvamp
  • 71
  • 1
  • 3
4

I'm not positive that you can. If you check out Mastering Regular Expressions, some PHP specific optimization techniques are discussed in Chapter10: PHP. Specifically the use of the S pattern modifier to cause the regex engine to "Study" the regular expression before it applies it. Depending on your pattern and your text, this could give you some speed improvements.

Edit: you can take a peek at the contents of the book using books.google.com.

Grey Panther
  • 12,110
  • 6
  • 40
  • 63
EBGreen
  • 33,707
  • 11
  • 58
  • 80
  • Every developer who use regex should read this book !! All the techniques you need to be efficient are in this book. – Arno Dec 08 '09 at 14:35