4

i would like to inject some code after X paragraphs, and this is pretty easy with php.

public function inject($text, $paragraph = 2) {

    $exploded = explode("</p>", $text);
    if (isset($exploded[$paragraph])) {
        $exploded[$paragraph] = '
            MYCODE
            ' . $exploded[$paragraph];

        return implode("</p>", $exploded);
    }
    return $text;
}

But, I don't want to inject my $text inside a <table>, so how to avoid this?

Thanks

Oscar Fanelli
  • 2,579
  • 1
  • 22
  • 38
  • do not parse html with regex – Valerij Apr 20 '14 at 09:46
  • If you saw the original short answer, I considerably expanded it, but have to go now. :) – zx81 Apr 20 '14 at 09:57
  • 2
    @oscar I added Method 1 which is better than last night's method. The idea woke me up and it now looks like Hamza had a similar idea to use a (*SKIP)(*FAIL) but from glancing at it it's not the same idea---perhaps more compact, will go read Hamza's idea in detail now. – zx81 Apr 20 '14 at 17:03

2 Answers2

3

I'm sometimes a bit crazy, sometimes I go for patterns that are lazy, but this time I'm going for something hazy.

$input = 'test <table><p>wuuut</p><table><p>lolwut</p></table></table> <p>foo bar</p> test1 <p>baz qux</p> test3'; # Some input
$insertAfter = 2; # Insert after N p tags
$code = 'CODE'; # The code we want to insert

$regex = <<<'regex'
~
# let's define something
(?(DEFINE)
   (?P<table>                     # To match nested table tags
      <table\b[^>]*>
         (?:
            (?!</?table\b[^>]*>).
         |
            (?&table)
         )*
      </table\s*>
   )
   (?P<paragraph>                 # To match nested p tags
      <p\b[^>]*>
         (?:
            (?!</?p\b[^>]*>).
         |
            (?&paragraph)
         )*
      </p\s*>
   )
)
(?&table)(*SKIP)(*FAIL)           # Let's skip table tags
|
(?&paragraph)                     # And match p tags
~xsi
regex;

$output = preg_replace_callback($regex, function($m)use($insertAfter, $code){
    static $counter = 0; # A counter
    $counter++;
    if($counter === $insertAfter){ # Should I explain?
        return $m[0] . $code;
    }else{
        return $m[0];
    }
}, $input);

var_dump($output); # Let's see what we've got

Online regex demo Online php demo

References:

Community
  • 1
  • 1
HamZa
  • 13,530
  • 11
  • 51
  • 70
  • 2
    Nice idea! I always like your work. The idea of a preg_replace_callback with a (*SKIP)(*FAIL) woke me up, and I worked on it and added it to my solution before seeing your answer. But it looks very different from your answer, which I'll now go read in detail. – zx81 Apr 20 '14 at 17:08
  • @HamZa there is only a problem: if there are no paragraph outside the table, I receive a "no data received" error from the browser.. – Oscar Fanelli Apr 26 '14 at 12:06
  • @OscarFanelli [It does work](https://eval.in/142468) on my end. So I think the problem is elsewhere – HamZa Apr 26 '14 at 13:28
  • @OscarFanelli So there's no problem? – HamZa Apr 26 '14 at 17:39
  • Now it's ok, but maybe there could be a better way to avoid the problem, editing the regex. However i'm not so good with them :( – Oscar Fanelli Apr 27 '14 at 00:37
  • @OscarFanelli I'm definitely sure there is no problem with the regex. So to break this certainty of mine, you need to show me a demo where it fails. From there on I could work on something. – HamZa Apr 27 '14 at 00:41
2

EDIT: It was late last night.

  1. The PREG_SPLIT_DELIM_CAPTURE was neat but I am now adding a better idea (Method 1).

  2. Also improved Method 2 to replace the strstr with a faster substr

METHOD 1: preg_replace_callback with (*SKIP)(*FAIL) (better)

Let's do a direct replace on the text that is certifiably table-free using a callback to your inject function.

Here's a regex to match table-free text:

$regex = "~(?si)(?!<table>).*?(?=<table|</table)|<table.*?</table>(*SKIP)(*FAIL)~";

In short, this either matches text that is a complete non-table or matches a complete table and fails.

Here's your replacement:

$injectedString = preg_replace_callback($regex,
        function($m){return inject($text,$m[0]);},
            $data);

Much shorter!

And here's a demo of $regex showing you how it matches elements that don't contain a table.

$text = "<table> to 
</table>not a table # 1<table> to 
</table>NOT A TABLE # 2<table> to 
</table>";
$regex = "~(?si)(?!<table>).*?(?=<table|</table)|<table.*?</table>(*SKIP)(*FAIL)~";
$a = preg_match_all($regex,$text,$m);
print_r($m);

The output: Array ( [0] => Array ( [0] => not a table # 1 [1] => NOT A TABLE # 2 ) )

Of course the html is not well formed and $data starts in the middle of a table, all bets are off. If that's a problem let me know and we can work on the regex.

METHOD 2

Here is the first solution that came to mind.

In short, I would look at using preg_split with the PREG_SPLIT_DELIM_CAPTURE flag.

The basic idea is to isolate the tables using a special preg_split, and to perform your injections on the elements that are certifiably table-free.

A. Step 1: split $data using an unusual delimiter: your delimiter will be a full table sequence: from <table to </table>

This is achieved with a delimiter specified by a regex pattern such as (?s)<table.*?</table>

Note that I am not closing <table in case you have a class there.

So you have something like

$tableseparator = preg_split( "~(?s)(<table.*?</table>)~", $data, -1, PREG_SPLIT_DELIM_CAPTURE );

The benefit of this PREG_SPLIT_DELIM_CAPTURE flag is that the whole delimiter, which we capture thanks to the parentheses in the regex pattern, becomes an element in the array, so that we can isolate the tables without losing them. [See demo of this at the bottom.] This way, your string is broken into clean "table-free" and "is-a-table" pieces.

B. Step 2: Iterate over the $tableseparator elements. For each element, do a

if(substr($tableseparator[$i],0,6)=="<table")

If <table is found, leave the element alone (don't inject). If it isn't found, that element is clean, and you can do your inject() magic on it.

C. Step 3: Put the elements of $tableseparator back together (implode just like you do in your inject function).

So you have a two-level explosion and implosion, first with preg_split, second with your explode!

Sorry that I don't have time to code everything in detail, but I'm certain that you can figure it out. :)

preg_split with PREG_SPLIT_DELIM_CAPTURE demo

Here's a demo of how the preg_split works:

$text = "Hi@There@@Oscar@@@@";
$regex = "~(@+)~";
$a = preg_split($regex,$text,-1,PREG_SPLIT_DELIM_CAPTURE);
print_r($a);

The Output: Array ( [0] => Hi [1] => @ [2] => There [3] => @@ [4] => Oscar [5] => @@@@ [6] => )

See how in this example the delimiters (the @ sequences) are preserved? You have surgically isolated them but not lost them, so you can work on the other strings then put everything back together.

zx81
  • 38,175
  • 8
  • 76
  • 97
  • very nice! I absolutely have to try it – Oscar Fanelli Apr 20 '14 at 10:32
  • 2
    @OscarFanelli Thanks for your comment. It's mad!!! I woke up with this idea of preg_replace_callback with (*SKIP)(*FAIL), worked furiously on it and saw that the mighty Hamza had posted one a short one before. But the answers look quite different so now you have a portfolio to choose from for your code. :) – zx81 Apr 20 '14 at 17:10
  • 1
    @HamZa But I can upvote you. :) Had gone back to sleep. – zx81 Apr 20 '14 at 19:25