PHP regex optimize

Question

I've got a regular expression that match everything between <anything> and I'm using this:

'@<([\w]+)>@'

today but I believe that there might be a better way to do it?

/ Tobias

What do you mean by "anything"? Your regex implies you mean word characters. If so the only thing you can do is to omit square brackets. they are useless there — alpha-mouse, Dec 01 '10 at 13:44
Also note, that the regex will match like you said, and if you are matching html/xml, it would match against `
` as ` — Rahly, Dec 01 '10 at 15:55

score 1 · Accepted Answer · answered Dec 01 '10 at 13:45

1

\w doesn't match everything like you said, by the way, just [a-zA-Z0-9_]. Assuming you were using "everything" in a loose manner and \w is what you want, you don't need square brackets around the \w. Otherwise it's fine.

answered Dec 01 '10 at 13:45

Spencer Hakim

1,545
9
19

1

In PHP `\w` is locale dependent, so it will match 'unexpected' characters, depending on your locale settings. – Jacco Dec 01 '10 at 14:05

score 1 · Answer 2 · answered Dec 01 '10 at 13:50

1

If "anything" is "anything except a > char", then you can:

@<([^>]+)>@

Testing will show if this performs better or worse.

Also, are you sure that you need to optimize? Does your original regex do what it should?

answered Dec 01 '10 at 13:50

jensgram

29,088
5
77
95

score 0 · Answer 3 · answered Dec 01 '10 at 13:57

You better use PHP string functions for this task. It will be a lot faster and not too complex.

For example:

$string = "abcd<xyz>ab<c>d";

$curr_offset = 0;
$matches = array();

$opening_tag_pos = strpos($string, '<', $curr_offset);

while($opening_tag_pos !== false)
{
    $curr_offset = $opening_tag_pos;
    $closing_tag_pos = strpos($string, '>', $curr_offset);
    $matches[] = substr($string, $opening_tag_pos+1, ($closing_tag_pos-$opening_tag_pos-1));

    $curr_offset = $closing_tag_pos;
    $opening_tag_pos = strpos($string, '<', $curr_offset);
}

/*
     $matches = Array ( [0] => xyz [1] => c ) 
*/

Of course, if you are trying to parse HTML or XML, use a XHTML parser instead

score -1 · Answer 4 · edited May 23 '17 at 10:26

-1

That looks alright. What's not optimal about it?

You may also want to consider something other regex if you're trying to parse HTML: RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 10:26

Community

1
1

answered Dec 01 '10 at 13:45

mqsoh

3,090
1
21
26

*You* should consider [this](http://stackoverflow.com/questions/4261209/turning-a-input-typeradio-into-a-button-with-regex-c/4261912#4261912), [this](http://stackoverflow.com/questions/4031112/regular-expression-matching), [this](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491) and [this](http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326). Then you need to make a judgment call between what is *theoretically possible* vs what is *practically expedient*. – tchrist Dec 01 '10 at 14:08
I don't know that he's parsing HTML at all. However, he is asking for an 'optimization' on very basic regex. If it needs optimization than something isn't practically expedient. Down-voting me is harsh, but your links are good. – mqsoh Dec 01 '10 at 14:24

PHP regex optimize

4 Answers4