I'm trying to extract nonHTML tags ( like: <!This TAG>
) from strings.
I use below regular expression to extract tags:
$Tags = preg_split('/(<![^>]*[^\/]>)/i', $Content, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
But problem is all HTML comment tags ( like <!-- This One -->
) will be extract as well.
I can use a trick like below example to remove comment Tags but still any nonHTML tags between them will be extracted!
foreach($Tags as $key => $value) {
if(mb_substr($value, 0, 4) == '<!--')
continue;
$CheckTag = mb_substr($value, 0, 2);
if($CheckTag == '<!') {
//...
}
}
For examples:
<!--<p>some text here.</p>-->
=> Work.
<!-- <!Tag1><!Tag2><!Tag3> -->
=> Not Work! (Tag2 & Tags3 extracted)
I'm looking for better regular expression to skip entire content between <!--
to -->
, thanx for any tips.
For a better perspective this is the original function:
public function extractFakeTags($Content) {
$Tags = preg_split('/(<![^>]*[^\/]>)/i', $Content, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$FakeTags = array();
$Content = $Tags;
foreach($Tags as $key => $current) {
if(mb_substr($current, 0, 4) == '<!--')
continue;
$TagBegin = mb_substr($current, 0, 2);
if($TagBegin == '<!') {
$TagLength = mb_strlen($current);
$TagEnd = mb_substr($current, ($TagLength-1), 1);
if($TagEnd=='>') {
$TagName = mb_substr($current, 2, ($TagLength-3));
if (array_key_exists($TagName, $FakeTags)) {
array_push($FakeTags[$TagName], $key);
}
else {
$FakeTags[$TagName] = array($key);
}
$Content[$key] = NULL;
}
}
}
return $FakeTags;
}