1

I've got a regular expression that match everything between <anything> and I'm using this:

'@<([\w]+)>@'

today but I believe that there might be a better way to do it?

/ Tobias

sandelius
  • 513
  • 1
  • 6
  • 14
  • What do you mean by "anything"? Your regex implies you mean word characters. If so the only thing you can do is to omit square brackets. they are useless there – alpha-mouse Dec 01 '10 at 13:44
  • Also note, that the regex will match like you said, and if you are matching html/xml, it would match against `
    ` as `
    – Rahly Dec 01 '10 at 15:55

4 Answers4

1

\w doesn't match everything like you said, by the way, just [a-zA-Z0-9_]. Assuming you were using "everything" in a loose manner and \w is what you want, you don't need square brackets around the \w. Otherwise it's fine.

Spencer Hakim
  • 1,545
  • 9
  • 19
  • 1
    In PHP `\w` is locale dependent, so it will match 'unexpected' characters, depending on your locale settings. – Jacco Dec 01 '10 at 14:05
1

If "anything" is "anything except a > char", then you can:

@<([^>]+)>@

Testing will show if this performs better or worse.

Also, are you sure that you need to optimize? Does your original regex do what it should?

jensgram
  • 29,088
  • 5
  • 77
  • 95
0

You better use PHP string functions for this task. It will be a lot faster and not too complex.

For example:

$string = "abcd<xyz>ab<c>d";

$curr_offset = 0;
$matches = array();

$opening_tag_pos = strpos($string, '<', $curr_offset);

while($opening_tag_pos !== false)
{
    $curr_offset = $opening_tag_pos;
    $closing_tag_pos = strpos($string, '>', $curr_offset);
    $matches[] = substr($string, $opening_tag_pos+1, ($closing_tag_pos-$opening_tag_pos-1));

    $curr_offset = $closing_tag_pos;
    $opening_tag_pos = strpos($string, '<', $curr_offset);
}

/*
     $matches = Array ( [0] => xyz [1] => c ) 
*/

Of course, if you are trying to parse HTML or XML, use a XHTML parser instead

Silver Light
  • 37,827
  • 29
  • 116
  • 159
-1

That looks alright. What's not optimal about it?

You may also want to consider something other regex if you're trying to parse HTML: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
mqsoh
  • 3,090
  • 1
  • 21
  • 26
  • *You* should consider [this](http://stackoverflow.com/questions/4261209/turning-a-input-typeradio-into-a-button-with-regex-c/4261912#4261912), [this](http://stackoverflow.com/questions/4031112/regular-expression-matching), [this](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491) and [this](http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326). Then you need to make a judgment call between what is *theoretically possible* vs what is *practically expedient*. – tchrist Dec 01 '10 at 14:08
  • I don't know that he's parsing HTML at all. However, he is asking for an 'optimization' on very basic regex. If it needs optimization than something isn't practically expedient. Down-voting me is harsh, but your links are good. – mqsoh Dec 01 '10 at 14:24